Regular Expressions for Beginners to PHP
1.Delimiter
What is the delimiter?
The so-called delimiter is to set a boundary, and the content must be written within this boundary
// This is the delimiter in regular expressions. The expression must be written in the middle of //
##That is, /a-z/2. What are the delimiters of?
Any character other than letters, numbers and backslash\ can be a delimiter, such as | |, //, {}, !!, etc., but it should be noted that if there is no For special needs, we all use delimiting symbols as regular expressions3.Composition of regular expressions
A standard regular expression consists of 3 parts: (1).Separator(2).Expression(3).ModifierSeparator : The delimiter is used to wrap the expression, which can be any character except special characters. The commonly used delimiter is "/"
Expression: The expression is composed of some special characters (element characters) and non-special characters (text characters) to form
Modifier: Modifiers in PHP regular expressions can change many characteristics of the regular expression, making the regular expression more suitable for you Required (Note: Modifiers are case-sensitive, which means "e" is not equal to "E")
What are the modifiers in regular expressions?
Types and introduction of PHP regular expression modifiers: ◆i: If "i" is added to the modifier, the regular expression will cancel the case. Sensitivity, i.e. "a" and "A" are the same. ◆m: The default regular start "^" and end "$" are only for regular strings. If "m" is added to the modifier, then the start and end will refer to each line of the string: The beginning of each line is "^" and the end is "$". ◆s: If "s" is added to the modifier, the default "." means that any character except the newline character will become any character, including the newline character! ◆x: If this modifier is added, whitespace characters in the expression will be ignored unless it has been escaped. ◆e: This modifier is only useful for replacement, which means it is used as PHP code in replacement. ◆A: If this modifier is used, the expression must be the beginning of the matched string. For example, "/a/A" matches "abcd". ◆E: Contrary to "m", if this modifier is used, then "$" will match the absolute end of the string, not before the newline character. This mode is turned on by default. ◆U: It has the same function as the question mark, and is used to set the "greedy mode".Atoms in regular expressions
The atom is the smallest unit in the regular expression. To put it bluntly, the atom is the content that needs to be matched. A valid regular expression must contain at least one atomExplanation: The spaces, carriage returns, line feeds, 0-9, A-Za-z, Chinese, punctuation marks, and special symbols we see are all atoms. Before doing the atomic example, let’s first explain a function, preg_match
Syntax: int preg_match (string $regular, string $string[, array &$result])
The above is preg_match Several commonly used main parameters. I did not list several other parameters above. Because the other two parameters are too uncommon.
Let’s prove it through experiments:
<?php header("Content-type: text/html; charset=utf-8");//设置编码 $zz = '/a/'; $string = 'ddfdjjvai2jfvkwkfi24'; if(preg_match($zz, $string, $matches)){ echo '匹配到了,结果为:'; var_dump($matches); }else{ echo '没有匹配到'; } ?>
Note: $zz is the rule of regular expression $string is a string. This example is to determine whether this string satisfies the matching regular expression If the formula is satisfied, the result will be output. If it is not satisfied, the information will be output.
Specially identified atoms
##\d Matches 0-9
<?php header("Content-type: text/html; charset=utf-8");//设置编码 // \d的用法 $zz = '/\d/'; $string = '我爱喝9你爱不爱喝'; if(preg_match($zz, $string, $matches)){ echo '匹配到了,结果为:'; var_dump($matches); }else{ echo '没有匹配到'; } ?>
\D All characters except 0-9
<?php // \D 匹配出0-9以外的所有字符 $zz = '/\D/'; $string = '12124323453453'; if(preg_match($zz, $string, $matches)){ echo '匹配到了,结果为:'; var_dump($matches); }else{ echo '没有匹配到'; } ?>
\w a-z A-Z0-9_
<?php // \w 匹配a-zA-Z0-9 还有下划线 $zz = '/\w/'; $string = '新中_国万岁呀万岁'; if(preg_match($zz, $string, $matches)){ echo '匹配到了,结果为:'; var_dump($matches); }else{ echo '没有匹配到'; } ?>
##\W Opposite of \w<?php
//\W 除a-zA-Z0-9_ 以外的所有字符
$zz = '/\W/';
$string = '......';
if(preg_match($zz, $string, $matches)){
echo '匹配到了,结果为:';
var_dump($matches);
}else{
echo '没有匹配到';
}
?>
\s Matches all whitespace characters<?php
// \s 匹配所有的空白字符
$zz = '/\s/';
$string = "中国万
岁";
if(preg_match($zz, $string, $matches)){
echo '匹配到了,结果为:';
var_dump($matches);
}else{
echo '没有匹配到';
}
?>
\S Non-empty characters<?php
// \S 匹配非空字符
$zz = '/\S/';
$string = "
a ";
if(preg_match($zz, $string, $matches)){
echo '匹配到了,结果为:';
var_dump($matches);
}else{
echo '没有匹配到';
}
?>
[] Specified range of atoms<?php
// [] 指定原子范围
$zz = '/[0-5]\w+/';
$string = '6a';
$string1 = '1C';
if(preg_match($zz, $string, $matches)){
echo '匹配到了,结果为:';
var_dump($matches);
}else{
echo '没有匹配到';
}
?>
Look at these\w \s \W \S is a bit hard to remember, so there are equivalents below. The effect is the same as \s \w etc.
+ Matches the preceding character at least 1 time <?php
header("Content-type: text/html; charset=utf-8");//设置编码
$zz = '/\d+/';
$string = "迪奥和奥迪250都是我最爱";
//待会儿再试试中间没有0-9的情况
//$string = "迪奥和奥迪都是我最爱";
if(preg_match($zz, $string, $matches)){
echo '匹配到了,结果为:';
var_dump($matches);
}else{
echo '没有匹配到';
}
?>
* Matches the preceding character 0 times or any number of times <?php
$zz = '/\w*/';
$string = "!@!@!!@#@!$@#!";
//待会儿再试试中间没有0-9的情况
//$string1 = "!@#!@#!abcABC#@#!";
if(preg_match($zz, $string, $matches)){
echo '匹配到了,结果为:';
var_dump($matches);
}else{
echo '没有匹配到';
}
?>
? The preceding character appears 0 or 1 times, optional <?php
$zz = '/ABC\d?ABC/';
$string = "ABC1ABC";
//待会儿再试试中间没有0-9的情况
//$string1 = "ABC888888ABC";
//$string2 = "ABCABC";
if(preg_match($zz, $string, $matches)){
echo '匹配到了,结果为:';
var_dump($matches);
}else{
echo '没有匹配到';
}
?>
. (dot) matches except\ All characters except n <?php
$zz = '/gg.+gg/';
$string = "ABC1ABC";
if(preg_match($zz, $string, $matches)){
echo '匹配到了,结果为:';
var_dump($matches);
}else{
echo '没有匹配到';
}
?>
| (vertical bar), or, lowest priority <?php
$zz = '/abc|bcd/';
$string1 = "abccd";
$string2 = "ggggbcd";
if (preg_match($zz, $string1, $matches)) {
echo '匹配到了,结果为:';
var_dump($matches);
} else {
echo '没有匹配到';
}
?>
through above We can see the following examples:
1. At first, my idea of matching was to match abccd or abbcd. However, when $string1 and $string2 are matched, the matching results are abc and bcd.
2. After achieving or matching, the matching results are abc or bcd. It does not have a higher priority than strings that are contiguous together
^ (circumflex), must start with the string after ^
<?php $zz = '/^张杰好帅\w+/'; $string1 = "张杰好帅abccdaaaasds"; //$string2没有以张杰好帅开始 $string2 = "帅abccdaaaasds"; if (preg_match($zz, $string1, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>
The following conclusions were found through experiments:
1 . $string1 matched successfully, $string2 did not match successfully
2. Because $string1 starts with the specified character
3. And $string2 does not start with the character after ^
4. The meaning of the translation of this regular rule is: starting with "Li Wenkai is so handsome" followed by at least one character a-zA-Z0-9_.
$ (dollar sign) must end with the character before $
<?php $zz = '/\d+努力$/'; $string1 = "12321124333努力"; //$string2 $string2 = "12311124112313力"; if (preg_match($zz, $string1, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>
Note:
$string1 matched successfully , and the $string2 match is unsuccessful. The character before
$ is \d+, followed by Chinese efforts.
Therefore, the match is this whole one. \d refers to the integer type 0-9, the + sign represents at least one 0-9
##{m} can and can only appear m times
<?php $zz = '/喝\d{1,3}酒/'; $string1 = "喝9酒"; //$string2 = "喝988酒"; if (preg_match($zz, $string1, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>Note:
In the above example\d{1,3}, I stipulated that 0-9 can only appear once, 2 or 3 times. All other times are wrong
{m,} At least m times, the maximum number is not limited
<?php $zz = '/喝\d{2,}/'; $string1 = "喝9"; //$string2 = "喝98"; //$string3 = "喝98122121"; if (preg_match($zz, $string1, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>
In the above example \d{2,} I stipulated that the 0-9 behind the drink should appear at least twice, and there is no limit to the maximum number of times. Therefore, $string1 is unsuccessful in matching, and $string2 is matched successfully. $string3 is a successful match
Tips for regular expressions
Write a little and test a littleBecause we need constant regularization, use preg_match Check whether the comparison is successful. If it succeeds, let’s write the next point. Until you finish writing and all matches are successful! Next let’s write an integrated example of a regular expression for email Step one: List all email formatsliwenkai@phpxy.comiwenkai@corp.baidu.cmiwenkai@126.com_w_k@xxx.com2345@qq.comFirst Match the character before @ \w+ (because it is 0-9A-Za-z_) The second one is followed by an @ characterThe third one is written [a-zA-Z0- 9-]+ Because the main domain names such as qq and 126 cannot be underscored by corp.baidu. Or 126. Usually the email suffix is like this. So we can write: ([a-zA-Z0-9-]+.){1,2}The above is to escape . so that it has its own meaning. The brackets must be repeated at least once and at most twice. Just follow com|cn|org|gov.cn|net|edu.cn and so on
<?php header("Content-type: text/html; charset=utf-8");//设置编码 $zz = '/\w+@([a-zA-Z0-9-]+.){1,2}(com|cn|org|gov.cn|net|edu.cn)/'; $string1 = "k53981@qq.com"; if (preg_match($zz, $string1, $matches)) { echo '匹配到了,结果为:'; var_dump($matches); } else { echo '没有匹配到'; } ?>