The main content of this article is about PHP's PCRE regular analysis, which has certain reference value. Interested friends can learn about it and hope it can help you.
1. Preface
In the previous blog, there is an analysis of the character set. This is not about the character set. Many functions in PHP process the UTF-8 encoding format in unicode by default. So without further ado, let’s get straight to the point.
2. PHP function mb_split analysis
1 <?php 2 $preg_strings = '测、试、一、下'; 3 $preg_str = mb_split('、', $preg_strings); 4 print_r($preg_str);
Print result:
Array( [0] => 测 [1] => 试 [2] => 一 [3] => 下)
This function defaults to underlying parsing, which is parsed in UTF-8 encoding format. The characters $preg_strings are separated by the hexadecimal code points of UNICODE with the delimiter (,).
3. PHP function preg_split analysis
Split the string "Test it"
1 <?php 2 $strings = '测试一下'; 3 $mb_arr = preg_split('//u', $strings, -1, PREG_SPLIT_NO_EMPTY); 4 print_r($mb_arr);
The print result is as follows:
Array( [0] => 测 [1] => 试 [2] => 一 [3] => 下 )
4. /u parsing in PCRE
In PHP, regular delimiters can be #, %, /, etc.
#Sometimes there are some modifiers behind a regular expression. So what do they all mean?
For example:
%[\x{4e00}-\x{9fa5}]+%u
The following modifiersucode table Use regular matching to match the encoding format of utf-8.
Example 1:
1 <?php 2 $strings = '测试一下'; 3 $is_true = preg_match_all('%[\x{4e00}-\x{9fa5}]+%u', $strings, $match); 4 var_dump($is_true);
The print result is as follows:
Array( [0] => Array ( [0] => 测试一下 ) )
here [\x{4e00}-\x{9fa5}]What does it mean?
In PHP regular code \x is used to represent hexadecimal.
Chinese UNICODE code point is in 4E00 - 9FFF (hexadecimal is mentioned here)
So, the regular matching method is the interval [], [\x{4E00}-\x{9FFF}]
##These two regular rules The effects are the same.
Related tutorials:The above is the detailed content of PHP PCRE regular analysis. For more information, please follow other related articles on the PHP Chinese website!