Home > Backend Development > PHP Tutorial > How to Correctly Get Multibyte Character Count Before a `preg_match()`?

How to Correctly Get Multibyte Character Count Before a `preg_match()`?

Susan Sarandon
Release: 2024-12-08 09:11:15
Original
752 people have browsed it

How to Correctly Get Multibyte Character Count Before a `preg_match()`?

Get Multibyte Character Count before Match with preg_match() (PREG_OFFSET_CAPTURE Parameter is Unhelpfully Counting Bytes)

In UTF-8 encoded strings, preg_match() may report incorrect character offsets within captured matches when using the PREG_OFFSET_CAPTURE parameter. The reason for this is that the captured offsets are counted in bytes, even when the subject string is interpreted as UTF-8 with the "u" modifier.

Solution:

To obtain the correct character offsets within UTF-8 captured matches, use mb_strlen to calculate the character count based on UTF-8 byte offsets:

$str = "\xC2\xA1Hola!";
preg_match('/H/u', $str, $a_matches, PREG_OFFSET_CAPTURE);
echo mb_strlen(substr($str, 0, $a_matches[0][1]));
Copy after login

The above is the detailed content of How to Correctly Get Multibyte Character Count Before a `preg_match()`?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template