In PHP programming, after using file_get_contents to get the web page, if you directly use echo to output, the browser output will be automatically parsed, and the output will still be the web page.
Use htmlspecialchars to convert the obtained content, and then get all the links. intercept.
There will be problems when intercepting,
Intercept content converted using htmlspecialchars, interception method:
-
- $word = substr($str,strpos($str,'>',5)+4,strpos($str,"<",10)-strpos($str,'>',5 )-4);
- function captureKeyArray($url)
- {
- $content=file_get_contents($url);
- $pattern="//imsU";
- $match = array();
- preg_match_all($pattern,$content,$match);
- $matchFilter = array();
- foreach($match[0] as $key=>$val)
- {
- $str= htmlspecialchars ($val);
- if(strpos($str,"img"))
- {
- }
- else
- {
- //Why can't you filter out < instead, use <
- $word = substr($str,strpos ($str,'>',5)+4,strpos($str,"<",10)-strpos($str,'>',5)-4);
- if($word!="")
- {
- array_push($matchFilter,$word);
- }
- }
- }
- return $matchFilter;
- }
Copy code
|