Home > Backend Development > C++ > body text

How to Determine the True Length of a UTF-8 Encoded std::string in C ?

Linda Hamilton
Release: 2024-10-27 20:43:30
Original
317 people have browsed it

How to Determine the True Length of a UTF-8 Encoded std::string in C  ?

Determining the True Length of a UTF-8 Encoded std::string

In C , a std::string is an array of characters, each occupying one byte of memory. However, in the case of UTF-8 encoding, a single character may be represented using a sequence of multiple bytes. This leads to a discrepancy between the length of the string as reported by str.length() and its actual length in characters.

As per the UTF-8 character encoding standard, bytes are grouped into sequences, with the first byte indicating the length of the sequence:

  • 0x00000000 - 0x0000007F: 1 byte
  • 0x00000080 - 0x000007FF: 2 bytes
  • 0x00000800 - 0x0000FFFF: 3 bytes
  • 0x00010000 - 0x001FFFFF: 4 bytes

To determine the actual length of a UTF-8 encoded std::string, you can employ the following approach:

  1. Iterate through the string character by character using the *s operator.
  2. For each character, check if the first byte (using the & operator) matches the continuation byte pattern (10xxxxxx).

If the first byte does not match the continuation pattern, increment the length count. This indicates the start of a new character sequence.

Here's an example implementation:

<code class="c++">int len = 0;
while (*s) len += (*s++ & 0xc0) != 0x80;</code>
Copy after login

By following this approach, you can accurately determine the true length of a UTF-8 encoded std::string, which is essential for various operations, such as character counting, string manipulation, and data parsing.

The above is the detailed content of How to Determine the True Length of a UTF-8 Encoded std::string in C ?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!