The specific usage of PHP function preg_match_all can be found in PHP function preg_match_all instance requirements: take out the ID and content of each DIV element, such as biuuu, biuuu_2, biuuu_3, php self-study network , php self-study network 2 and php self-study network 3 (some common website grabbing methods are matched in this way)
Analysis: The string is a simple HTML element, and each DIV element corresponds to an ID and content. And it is independent. First consider how to extract the ID value and content within a DIV, such as: php self-study network, and then match other similar elements. Two values need to be taken out from a DIV, that is, two matching expressions. The first expression is used to match the ID value (biuuu), and the second expression is used to match the content of the ID (php self-study network). Regular Expressions commonly used in expressions use parentheses, then the previous elements will become the following form:
<ol class="dp-xml"> <li class="alt"><span><span class="tag">< </span><span class="tag-name">div</span><span> </span></span></li><li class="alt"><span><span class="attribute">id</span><span>=</span><span class="attribute-value">"(biuuu)"</span><span class="tag">></span></span></li> <li class="alt"><span><span>(php自学网)</span></span></li> <li class="alt"><span><span class="tag">< /</span><span class="tag-name">div</span><span class="tag">></span><span> </span></span></li> <li class="alt"><span><span class="tag">< </span><span class="tag-name">div</span><span> </span><span class="attribute">id</span><span>=</span><span class="attribute-value">"(表达式1)"</span><span class="tag">></span></span></li> <li class="alt"><span><span>(表达式2)</span></span></li> <li class="alt"><span><span class="tag">< /</span><span class="tag-name">div</span><span class="tag">></span><span> </span></span></li> </ol>
Okay, use the above parentheses to divide the area that needs to be matched. The next step is how to match. For the content in each expression, we guess that an ID may be letters, numbers or underscores. Then it becomes simple. It can be achieved by using square brackets, as follows:
Expression 1: [a-zA -Z0-9_]+ (means matching uppercase and lowercase letters, numbers and underscores)
How to match expression 2, because the content of the ID can be any character, but be careful, it cannot match < or > ; character, because if you match these two characters, all DIVs used later will be matched, so you need to exclude elements starting with these two characters, that is, do not match the < or > characters, as follows:
Expression 2: [^<>]+ (indicating that the < and > characters are not matched)
In this way, the subexpression that the PHP function preg_match_all needs to match is implemented, but it also needs To match an expression, the method is as follows:
Expression: / '"(Expression 1)"'>(Expression 2)
Pay attention to the Double quotes " and / need to be escaped using escape characters, and then put the first two expressions in, as follows:
<ol class="dp-xml"> <li class="alt"> <span><span>'"([a-z0-9_]+)"'</span><span class="tag">></span></span><span><span>/</span></span> </li> <li class="alt"><span><span class="tag">< </span><span class="tag-name">div</span><span> </span><span class="attribute">id</span><span>=</span><span class="attribute-value">"([a-z0-9_]+)"</span><span class="tag">></span></span></li> <li class="alt"><span><span>([^</span><span class="tag"><</span><span class="tag">></span><span>]+)</span></span></li> <li class="alt"><span><span class="tag">< /</span><span class="tag-name">div</span><span class="tag">></span><span>/ </span></span></li> </ol>
This will implement a regular expression that matches the ID value and content of each DIV element Formula, and then use the preg_match_all function to test as follows:
<ol class="dp-xml"> <li class="alt"><span><span>$</span><span class="attribute">html</span><span> = </span></span></li> <li class="alt"><span><span class="attribute-value">'< div id="biuuu"></span></span></li> <li class="alt"><span><span class="attribute-value">php自学网</span></span></li> <li class="alt"><span><span class="attribute-value">< /div></span></span></li> <li class="alt"><span><span class="attribute-value">< div id="biuuu_2"></span></span></li> <li class="alt"><span><span class="attribute-value">php自学网2</span></span></li> <li class="alt"><span><span class="attribute-value">< /div></span></span></li> <li class="alt"><span><span class="attribute-value">< div id="biuuu_3"></span></span></li> <li class="alt"><span><span class="attribute-value">php自学网3</span></span></li> <li class="alt"><span><span class="attribute-value">< /div>'</span><span>; </span></span></li> <li><span>preg_match_all('/</span></li> <li><span class="tag">< </span><span class="attribute">divsid</span><span class="tag-name">divsid</span><span>=</span><span class="attribute-value">"([a-z0-9_]+)"</span><span class="tag">></span></li> <li> <span>([^</span><span class="tag"><</span><span class="tag">></span><span>]+)</span> </li> <li> <span class="tag">< /</span><span class="tag-name">div</span><span class="tag">></span><span>/'</span><span>,$html,$result); </span> </li> <li class="alt"><span>var_dump($result); </span></li> </ol>
PHP function preg_match_all example result:
<ol class="dp-xml"> <li class="alt"><span><span>array(3) { </span></span></li> <li> <span>[0]=</span><span class="tag">></span><span> array(3) </span> </li> <li class="alt"><span>{ </span></li> <li> <span>[0]=</span><span class="tag">></span><span> string(30) "<br></span><span class="tag"><</span><span class="tag-name">div</span><span> </span><span class="attribute">id</span><span>=</span><span class="attribute-value">"biuuu"</span><span class="tag">></span><span>php自学网</span><span class="tag"></</span><span class="tag-name">div</span><span class="tag">></span><span>" </span> </li> <li class="alt"> <span>[1]=</span><span class="tag">></span><span> string(33) "<br></span><span class="tag"><</span><span class="tag-name">div</span><span> </span><span class="attribute">id</span><span>=</span><span class="attribute-value">"biuuu_2"</span><span class="tag">></span><span>php自学网2</span><span class="tag"></</span><span class="tag-name">div</span><span class="tag">></span><span>" </span> </li> <li> <span>[2]=</span><span class="tag">></span><span> string(33) "<br></span><span class="tag"><</span><span class="tag-name">div</span><span> </span><span class="attribute">id</span><span>=</span><span class="attribute-value">"biuuu_3"</span><span class="tag">></span><span>php自学网3</span><span class="tag"></</span><span class="tag-name">div</span><span class="tag">></span><span>" </span> </li> <li class="alt"><span>} </span></li> <li> <span>[1]=</span><span class="tag">></span><span> array(3) </span> </li> <li class="alt"><span>{ </span></li> <li> <span>[0]=</span><span class="tag">></span><span> string(5) "biuuu" </span> </li> <li class="alt"> <span>[1]=</span><span class="tag">></span><span> string(7) "biuuu_2" </span> </li> <li> <span>[2]=</span><span class="tag">></span><span> string(7) "biuuu_3" </span> </li> <li class="alt"><span>} </span></li> <li> <span>[2]=</span><span class="tag">></span><span> array(3) </span> </li> <li class="alt"><span>{ </span></li> <li> <span>[0]=</span><span class="tag">></span><span> string(8) "php自学网" </span> </li> <li class="alt"> <span>[1]=</span><span class="tag">></span><span> string(9) "php自学网2" </span> </li> <li> <span>[2]=</span><span class="tag">></span><span> string(9) "php自学网3" </span> </li> <li class="alt"><span>} </span></li> <li><span>} </span></li> </ol>
There are three expressions, which display the matching value of each expression and store it in the form of an array, so that the ID and content of each DIV element can be taken out. The most important thing to use regular expressions is to know what is needed, and then Matching is required, and the thinking must be clear. It is very convenient to use the PHP function preg_match_all for output debugging.