Home > Java > javaTutorial > How to Efficiently Iterate Over Unicode Codepoints in Java Strings?

How to Efficiently Iterate Over Unicode Codepoints in Java Strings?

Mary-Kate Olsen
Release: 2024-11-02 06:49:02
Original
640 people have browsed it

How to Efficiently Iterate Over Unicode Codepoints in Java Strings?

Iterating over Unicode Codepoints in Java Strings

While the String class provides the codePointAt(int) method for accessing Unicode codepoints, its indexing relies on character offsets rather than codepoint offsets. This raises concerns about handling characters within the high-surrogate range and the efficiency of the proposed iteration approach using character-by-character scanning.

Improved Iteration Solution

Java's internal String representation employs a UTF-16-based encoding scheme. Characters outside the Basic Multilingual Plane (BMP) are encoded using the surrogacy scheme. For efficient iteration, consider using the following canonical approach:

<code class="java">final int length = s.length();
for (int offset = 0; offset < length; ) {
   final int codepoint = s.codePointAt(offset);

   // process the codepoint

   offset += Character.charCount(codepoint);
}</code>
Copy after login

This approach correctly handles surrogate pairs for characters outside the BMP. By utilizing Character.charCount(codepoint), it efficiently increments the offset by the appropriate number of characters for each codepoint.

The above is the detailed content of How to Efficiently Iterate Over Unicode Codepoints in Java Strings?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template