Atoms in php regular expression representation

Atom

The atom is the smallest unit in the regular expression. To put it bluntly, the atom is the content that needs to be matched. A valid regular expression must contain at least one atom.

All visible and invisible characters are atoms

Explanation: The spaces we see, carriage returns, line feeds, 0-9, A-Za-z, Chinese , punctuation marks, and special symbols are all atoms.

Before doing the atomic example, let’s first explain a function, preg_match:

int preg_match (string $regular, string $string[, array &$result])

Function: Match $string variable based on $regular variable. If it exists, return the number of matches and put the matched results into the $result variable. If no result is found, 0 is returned.

Note: The above are the main parameters commonly used by preg_match. I did not list several other parameters above. Because the other two parameters are too uncommon.

Let's prove it through experiments:

<?php
//定义一个变量叫zz,放正则表达示。为了方便大家记忆,如果你英文比较ok,建议把变量名还是写成英文的$pattern。
$zz = '/a/';

$string = 'ddfdjjvi2jfvkwkfi24';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

Because I hope to match a, and $string does not exist, so it is unsuccessful.

<?php
$zz = '/wq/';

$string = 'ssssswqaaaaaa';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

There is wq after s in the above string, so the match is successful.

Next let’s try matching a space:

<?php
$zz = '/ /';

$string = 'sssssw aaaaa';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

The execution result is as follows:

QQ截图20161114135142.png

Therefore, $string this There is a space after the w character of the variable. So the match is successful and the string type is output with a length of 1. It's just that our naked eyes are invisible and cannot see this string.

Specially identified atoms

##\D except 0-9 All characters ##\w\W\s\S[ ]


You need to remember this, it is best to reach the dictation level. When memorizing, remember in pairs. \d matches a 0-9, then \D is all characters except 0-9.
The above has been explained very clearly. We will conduct experiments to learn these step by step.

Please be sure to reach the dictation level for these atoms when studying. Because when we do experiments in the future, you will learn it bit by bit.

\dmatches a value between 0-9

<?php
$zz = '/\d/';

$string = '我爱喝9你爱不爱喝';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

\dmatches a value other than 0-9

<?php
$zz = '/\D/';

$string = '121243中23453453';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

The match was successful and matched. Because it is not a character between 0-9.

\w Matches a-zA-Z0-9_

<?php
$zz = '/\w/';

$string = '新中_国万岁呀万岁';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

The match is successful and the underscore is matched.

\WMatch a non-a-zA-Z0-9_

<?php
$zz = '/\w/';

$string = 'afasABCWEQR44231284737';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

Match failed. Because, all the above are a-zA-Z0-9_, and there is nothing that is not a-zA-Z0-9_.

\s matches all whitespace characters\n \t \r spaces

<?php
$zz = '/\s/';

$string = "中国万
岁";

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

The match is successful because there is a carriage return.

\S Non-empty characters

<?php
$zz = '/\s/';

$string = "        
         a       ";

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

matched successfully. Although there are spaces, carriage returns and indents on it. However, there is a non-whitespace character a. Therefore, the match is successful.

[] Specified range of atoms

<?php

$zz = '/[0-5]\w+/';

$string = '6a';

$string1 = '1C';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

Conclusion:
In the above example, 0-5 failed to match $string, while $string1 success. Because, the first value in $string is 6, which is not in the range of [0-5].

<?php

$zz = '/[a-zA-Z0-9_]\w/';

$string = 'ab';

$string1 = '9A';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

Conclusion:

$string and $string1 both match successfully. Because \w is [a-zA-Z0-9_]

<?php

$zz = '/[abc]\d+/';

$string = 'a9';

$string1 = 'b1';

$string2 = 'c5';

$string3 = 'd4';


if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

Conclusion:

$string, $string1, $string2 are matched successfully, but $string3 is unsuccessful. Because $string3 exceeds the range of [abc], it starts from d.

[^ character] does not match characters in the specified interval

<?php

$zz = '/[^0-9A-Za-z_]/';

$string = 'aaaaab311dd';

$string1 = '!$@!#%$#^##';

if(preg_match($zz, $string, $matches)){
   echo '匹配到了,结果为:';
   var_dump($matches);
}else{
   echo '没有匹配到';
}

?>

Conclusion:

1. Matching $string failed , but it succeeds when matching $string1. Because there is a circumflex character inside the square brackets.

2.^ The function of the circumflex character inside the square brackets is not to match the characters inside the square brackets.

Summary:

AtomsDescription
\dmatches a 0-9
a-zA-Z0-9_
except All characters except 0-9A-Za-z_
Matches all whitespace characters\n \t \r Space
Match all non-whitespace characters
Specified range of atoms
##\S[^ \ t\n\f\r]Continuing Learning
||
<?php $zz = '/[^0-9A-Za-z_]/'; $string = 'aaaaab311dd'; $string1 = '!$@!#%$#^##'; if(preg_match($zz, $string, $matches)){ echo '匹配到了,结果为:'; var_dump($matches); }else{ echo '没有匹配到'; } ?>
submitReset Code
AtomicEquivalence
\w[a-zA-Z0-9_]
\W[^a-zA-Z0-9_]
\d[0-9]
\D[^0-9]
\s[ \t\n\f\r]