Basic PHP Development Tutorial: Metacharacters in Regular Expressions

1. Metacharacters

New requirements: \d represents matching a character. And now I want to match ten or eight, what should I do with any number of numbers?

At this time we need to use metacharacters. When using atoms, I found that it can only match one character, but problems arise when matching multiple characters.
At this time, we need to use metacharacters to help us modify atoms and achieve more functions.

Don’t be scared by the following. We will understand everything after we do experiments bit by bit. The main thing is that these are more versatile.

Let’s see:

21.png

  • ## + Match the previous character at least 1 time

The code is as follows:

<?php
$zz = '/\d+/';
$string = "迪奥和奥迪250都是我最爱";
//待会儿再试试中间没有0-9的情况
//$string = "迪奥和奥迪都是我最爱";
if(preg_match($zz, $string, $matches)){
    echo '匹配到了,结果为:';
    var_dump($matches);
}else{
    echo '没有匹配到';
}
?>

The match is successful, proving the + in \d+. \d matches numbers, and + matches the previous character at least once.

  • Matches the previous character 0 times or any number of times

  • ## The code is as follows:
<?php
$zz = '/\w*/';
 
$string = "!@!@!!@#@!$@#!";
//$string1 = "!@#!@#!abcABC#@#!";
if(preg_match($zz, $string, $matches)){
    echo '匹配到了,结果为:';
    var_dump($matches);
}else{
    echo '没有匹配到';
}
?>

Note that the commented out $string1 and $string are matched successfully. Because, \w matches 0-9A-Za-z_, and * means that the previous \w does not need to exist. If present there can be 1 or more.

  • ? The previous character appears 0 or 1 times, optional

  • The code is as follows:
<?php
$zz = '/ABC\d?ABC/';
$string = "ABC1ABC"
//待会儿再试试中间没有0-9的情况
//$string1 = "ABC888888ABC";
//$string2 = "ABCABC";
if(preg_match($zz, $string, $matches)){
    echo '匹配到了,结果为:';
    var_dump($matches);
}else{
    echo '没有匹配到';
}
?>

Matches $string, $string2 successfully, but fails to match $string1.

Because there are ABC before and after the match, and there is a 0-9 in the middle. 0-9 is optional, but there cannot be more than one.


  • . (dot) matches all characters except \n

    <?php
    $zz = '/gg.+gg/';
    $string = "ABC1ABC";
    if(preg_match($zz, $string, $matches)){
        echo '匹配到了,结果为:';
        var_dump($matches);
    }else{
        echo '没有匹配到';
    }
    ?>
  • does not match successfully because both before and after It must be gg,

2, | (vertical bar), or, the lowest priority We will use experiments to see the priority sum or Matching

<?php
 
$zz = '/abc|bcd/';
$string1 = "abccd";
$string2 = "ggggbcd";
 
if (preg_match($zz, $string1, $matches)) {
    echo '匹配到了,结果为:';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

Let’s take a look:

1. At first, my idea of ​​matching was to match abccd or abbcd. However, when $string1 and $string2 are matched, the matching results are abc and bcd.

2. After achieving or matching, the matching results are abc or bcd. It does not have a higher priority than strings contiguous together.

Then the question is, what should I do if I want to match abccd or abbcd in the above example?

You need to use () to change the priority.

The code is as follows:

<?php
$zz = '/ab(c|b)cd/';
$string1 = "起来abccd阅兵";
$string2 = "ggggbcd";
$string3 = '中国abbcd未来';
if (preg_match($zz, $string1, $matches)) {
    echo '匹配到了,结果为:';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

The results are as follows:

22.png

Conclusion:

1. It does match abccd or abbcd ($string1 or $string3).

2. But there is one more element in the matching array, and the subscript of this element is 1

3. As long as the content in () matches successfully, the matched data will be placed in the array element with subscript 1.

3. ^ (circumflex) must start with the string after ^

<?php
 
$zz = '/^小明\w+/';
$string1 = "小明abccdaaaasds";
//$string2小明
$string2 = "明abccdaaaasds";
if (preg_match($zz, $string1, $matches)) {
    echo '匹配到了,结果为:';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

The following conclusions were found through experiments :

1.$string1 matched successfully, $string2 did not match successfully

2.Because $string1 starts with the specified character

3. And $string2 does not start with the character after ^

4. The translation of this regular rule means: starting with Xiao Ming and followed by a-zA-Z0-9_At least one character.

4. $ (dollar sign) must end with the character before $

<?php
$zz = '/\d+努力$/';
$string1 = "12321124333努力";
//$string2
$string2 = "12311124112313力";
if (preg_match($zz, $string1, $matches)) {
    echo '匹配到了,结果为:';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

Let’s run it and see After looking at the results, we came to the conclusion:

1.$string1 matched successfully, but $string2 failed to match

2.$ The character is \d+, followed by Chinese effort.

3. Therefore, what is matched is this whole. \d refers to an integer of 0-9, and the + sign represents at least one 0-9

##5. \b and \B word boundaries and non-word boundaries

Let’s explain what boundaries are:

1. Regular expressions have boundaries. This boundary is the beginning and end of the delimiter, which are regular boundaries.

2.this is an English word, followed by a space, which means that the word has ended and the boundary of the word has been reached

  • \bWord boundary, It means it must be first or last.

  • \BNon-boundary means that it cannot be at the beginning or end of a regular expression.

  • <?php
    $zz = '/\w+\b/';
    $string1 = "this is a apple";
    $string2 = "thisis a apple";
    $string3 = "thisisaapple";
     
    if (preg_match($zz, $string1, $matches)) {
        echo '匹配到了,结果为:';
        var_dump($matches);
    } else {
        echo '没有匹配到';
    }
    ?>

Conclusion:

1.$string1, $string2 and $string3 all match successfully.

2. When $string1 matches, this space is the boundary

3. When $string2 matches, thisis is the boundary

4. When $string3 matches, thisisaapple reaches the end of the entire regular expression, so it is also the boundary. So the match is successful.

Let’s experiment with non-word boundaries:

<?php
$zz = '/\Bthis/';
$string1 = "hellothis9";
//$string2 = "hello this9";
//$string2 = "this9中国万岁";
if (preg_match($zz, $string1, $matches)) {
    echo '匹配到了,结果为:';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

Summary:

Matches $string1 successfully but $string2 fails .

Because \B is followed by this, so this cannot appear at word boundaries (spaces and beginning and ending).

6. {m} can and can only appear m times

<?php
$zz = '/喝\d{3}酒/';
$string1 = "喝988酒";
//$string2 = "喝98811酒";
if (preg_match($zz, $string1, $matches)) {
    echo '匹配到了,结果为:';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

Conclusion: Above example中\d{3}I stipulated that 0-9 can only appear 3 times, not once more or less.

7. {n,m} can appear n to m times

<?php
$zz = '/喝\d{1,3}酒/';
$string1 = "喝9酒";
//$string2 = "喝988酒";
 
if (preg_match($zz, $string1, $matches)) {
    echo '匹配到了,结果为:';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

Conclusion: Part 1 In the example \d{1,3}, I stipulated that 0-9 can only appear once, 2 or 3 times. All other times are wrong

Eight, {m,} at least m times, the maximum number is not limited

<?php
$zz = '/喝\d{2,}/';
$string1 = "喝9";
//$string2 = "喝98";
//$string3 = "喝98122121";
if (preg_match($zz, $string1, $matches)) {
    echo '匹配到了,结果为:';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

Conclusion:
In the above example, we stipulate that \d{2,} and the following 0-9 appear at least twice, and there is no limit to the maximum number of times. Therefore, $string1 is unsuccessful in matching, and $string2 is matched successfully. $string3 is matched successfully.


Continuing Learning
||
<?php $zz = '/\d+/'; $string = "迪奥和奥迪250都是我最爱"; //待会儿再试试中间没有0-9的情况 //$string = "迪奥和奥迪都是我最爱"; if(preg_match($zz, $string, $matches)){ echo '匹配到了,结果为:'; var_dump($matches); }else{ echo '没有匹配到'; } ?>
submitReset Code