Home > Backend Development > PHP Tutorial > PHP PCRE regular analysis

PHP PCRE regular analysis

little bottle
Release: 2023-04-06 11:30:02
forward
3013 people have browsed it

The main content of this article is about PHP's PCRE regular analysis, which has certain reference value. Interested friends can learn about it and hope it can help you.

1. Preface

In the previous blog, there is an analysis of the character set. This is not about the character set. Many functions in PHP process the UTF-8 encoding format in unicode by default. So without further ado, let’s get straight to the point.

2. PHP function mb_split analysis

1 <?php
2 $preg_strings = &#39;测、试、一、下&#39;;
3 $preg_str = mb_split(&#39;、&#39;, $preg_strings);
4 print_r($preg_str);
Copy after login

Print result:

Array(
    [0] => 测
    [1] => 试
    [2] => 一
    [3] => 下)
Copy after login

This function defaults to underlying parsing, which is parsed in UTF-8 encoding format. The characters $preg_strings are separated by the hexadecimal code points of UNICODE with the delimiter (,).

3. PHP function preg_split analysis

Split the string "Test it"

1 <?php
2 $strings = &#39;测试一下&#39;;
3 $mb_arr = preg_split(&#39;//u&#39;, $strings, -1, PREG_SPLIT_NO_EMPTY);
4 print_r($mb_arr);
Copy after login

The print result is as follows:

Array(
    [0] => 测
    [1] => 试
    [2] => 一
    [3] => 下
)
Copy after login

4. /u parsing in PCRE

In PHP, regular delimiters can be #, %, /, etc.

#Sometimes there are some modifiers behind a regular expression. So what do they all mean?

For example:

%[\x{4e00}-\x{9fa5}]+%u
Copy after login

The following modifiersucode table Use regular matching to match the encoding format of utf-8.

Example 1:

1 <?php
2 $strings = &#39;测试一下&#39;;
3 $is_true = preg_match_all(&#39;%[\x{4e00}-\x{9fa5}]+%u&#39;, $strings, $match);
4 var_dump($is_true);
Copy after login

The print result is as follows:

Array(
    [0] => Array
        (
            [0] => 测试一下
        )
)
Copy after login

here [\x{4e00}-\x{9fa5}]What does it mean?

In PHP regular code \x is used to represent hexadecimal.

Chinese UNICODE code point is in 4E00 - 9FFF (hexadecimal is mentioned here)

So, the regular matching method is the interval [], [\x{4E00}-\x{9FFF}]

##These two regular rules The effects are the same.

Related tutorials:

PHP video tutorial

The above is the detailed content of PHP PCRE regular analysis. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
php
source:cnblogs.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template