Basic PHP development tutorial: Pattern modifiers in regular expressions
1. Problem introduction
We have completed the introduction to regular expression expression through metacharacters and atoms. There are some special situations that we still need to deal with.
How to match if abc is at the beginning of the second line?
I don’t want the regular expression to be particularly greedy in matching all, what should I do if it only matches part of it?
At this time, we need to use the following pattern matching to enhance the regular function.
Commonly used pattern matching characters are:
The usage of pattern matching characters is as follows:
/ Regular expression / pattern Matching symbol
The pattern matching symbol is placed at the end of this sentence. For example:
/\w+/s
We know the format clearly. The next most important thing is to strengthen the understanding and memory of the use of pattern matching characters. We use code to understand the difference between adding and not adding pattern matching characters
2. i is not case-sensitive
<?php //在后面加上了一个i $pattern = '/ABC/i'; $string = '8988abc12313'; $string1 = '11111ABC2222'; if(preg_match($pattern, $string, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else{ echo '没有匹配到'; } ?>
Conclusion, Both $string and $string1 are matched successfully. Therefore, after adding i at the end, the case of the matching content can be separated
3. m is regarded as multiple lines
When regular matching, the target string to be matched is usually regarded as a line.
The "start of line" metacharacter (^) only matches the beginning of a string, and the "end of line" metacharacter ($) only matches the end of a string.
When this modifier is set, "line start" and "line end" not only match the beginning and end of the entire string, but also match after and before the newline character in it respectively.
Note: If there is no "\n" character in the string to be matched or there is no ^ or $ in the pattern, setting this modifier has no effect.
Let’s verify this feature through experiments and code:
For the first match, you will find that the match is unsuccessful:
<?php $pattern = '/^a\d+/'; $string = "我的未来在自己手中我需要不断的努力 a9是一个不错的字符表示 怎么办呢,其实需要不断奋进"; if (preg_match($pattern, $string, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>
For the second match, Let’s try adding m:
<?php $pattern = '/^a\d+/m'; $string = "我的未来在自己手中我需要不断的努力 a9是一个不错的字符表示 怎么办呢,其实需要不断奋进"; if (preg_match($pattern, $string, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>
Result:
As shown on the right
The match was successful. /^a\d+/ The matched content is a9, which must be at the beginning of the line. The second line is also matched successfully.
4. s is treated as one line
If this modifier is set, the dot metacharacter (.) in the pattern matches all characters, including newline characters.
The first time, no pattern matching character s:
<?php $pattern = '/新的未来.+\d+/'; $string = '新的未来 987654321'; if (preg_match($pattern, $string, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>
The second time, add the pattern matching character s after the regular expression:
<?php $pattern = '/新的未来.+\d+/s'; $string = "新的未来 987654321"; if (preg_match($pattern, $string, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; ?>
The results are as follows, the match is successful!
Conclusion:
1. Because in the new future, there is a line break after the future
2. And .(dot) matches all characters except non-whitespace characters. Therefore, the first time was unsuccessful
3. The second time, the s pattern matcher was added. Because, after adding . (dot), it can match all characters.
5. x Ignore whitespace characters
1. If this modifier is set, whitespace characters in the pattern are completely ignored except those that are escaped or within character classes.
2. Characters between the # character outside the unescaped character class and the next newline character are also ignored
Let’s first experiment with features such as ignoring blank lines:
<?php $pattern = '/a b c /x'; $string = '学英语要从abc开始'; if (preg_match($pattern, $string, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>
This can also match successfully.
There are spaces in $pattern, and there is a space after each abc. There are no spaces in $string.
So x ignores whitespace characters.
The second sentence is more difficult to understand literally,
<?php //重点观察这一行 $pattern = '/a b c #我来写一个注释 /x'; $string = '学英语要从abc开始'; if (preg_match($pattern, $string, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>
The result is also a successful match!
Conclusion: We found that the second characteristic of x is that it is ignored: the characters between the # character and the next newline character are also ignored.
6. e Find the matching items and replace them
The e pattern is also called reverse reference . The main function is to take out the content in the regular expression brackets and put it into the replacement item to replace the original string.
Preg_replace() must be used before using this pattern matcher.
mixed preg_replace (mixed $regular match, mixed $replacement, mixed $search string)
The function of preg_replace: use $ The regular match changes and finds the $ search string variable. Then use the $replacement variable to replace it.
Let’s review the previous knowledge before the formal explanation. We deliberately put brackets around each atom to be matched:
<?php //加上了括号 $pattern = '/(\d+)([a-z]+)(\d+)/'; $string = '987abc321'; if (preg_match($pattern, $string, $match)) { echo '匹配到了,结果为:'; var_dump($match); } else { echo '没有匹配到'; } ?>
Let’s take a look The result on the right
is when we talked about parentheses before: there are parentheses outside the matched content. The content in the brackets will also be placed into the elements of the array. As shown in the picture: 987, abc, 321.
Let’s next look at the e pattern in the regular expression:
<?php $string = "{April 15, 2003}"; //'w'匹配字母,数字和下划线,'d'匹配0-99数字,'+'元字符规定其前导字符必须在目标对象中连续出现一次或多次 $pattern = "/{(\w+) (\d+), (\d+)}/i"; $replacement = "$2"; //字符串被替换为与第 n 个被捕获的括号内的子模式所匹配的文本 echo preg_replace($pattern, $replacement, $string); ?>
Observe the results on the right
Conclusion:
In the above example, \$2 points to the first (\d+) represented by the regular expression. It's equivalent to taking out 15 again. When replacing
, I write \$2. The matching items are taken out and used to replace the matching results again.
7. U Greedy Mode Control
Regular expressions are greedy by default, that is, matching as much as possible.
Let’s take a look at how greedy the regular expression is:
<?php $pattern = '/<div>.*<\/div>/'; $string = "<div>你好</div><div>我是</div>"; if (preg_match($pattern, $string, $match)) { echo '匹配到了,结果为:'; var_dump($match); } else { echo '没有匹配到'; } ?>
Let’s take a look at the results and get the following conclusion. It directly matches "Hello
" to "I am
". A maximum match was made.
For the same piece of code, let’s add an uppercase U and see the effect:
<?php $pattern = '/<div>.*<\/div>/U'; $string = "<div>你好</div><div>我是</div>"; if (preg_match($pattern, $string, $match)) { echo '匹配到了,结果为:'; var_dump($match); } else { echo '没有匹配到'; } ?>
Observe the output:
We found that only the match came out:
<div>Hello</div>
In this way, the greedy feature of regularity is cancelled. Let it find the nearest match and it's OK.
8. A Matches from the beginning of the target string
This mode is similar to the effect of ^ (circumflex) in metacharacters.
<?php $pattern = '/this/A'; $string = 'hello this is a '; //$string1 = 'this is a '; if (preg_match($pattern, $string, $match)) { echo '匹配到了,结果为:'; var_dump($match); } else { echo '没有匹配到'; } ?>
Conclusion:
If $string cannot be matched when adding the A mode modifier, it can be matched without adding it
If the A mode modifier is added, $string1 can be matched, because it must be matched from the beginning
9. D End$ No carriage return is allowed after
If this modifier is set, the dollar metacharacter in the pattern will only match the end of the target string. Without this option, the dollar sign will also match before the last character if it is a newline character.
<?php $pattern = '/\w+this$/'; //$pattern1 = '/\w+this$/D'; $string = "hellothis "; if (preg_match($pattern, $string, $match)) { echo '匹配到了,结果为:'; var_dump($match); } else { echo '没有匹配到'; } ?>
The results are shown on the right
Conclusion:
1. For example, when pattern matches $string, there is a return after the string this of $string. car. It can also match successfully without adding the D matching character
2. For example, when pattern matches $string, D is added. There is a space after the string this of $string, and the match is unsuccessful.