form:http://www.uphtm.com/php/253.html
This thing is actually commonly used by us developers. We used it when we were doing a project to capture friendly links from other websites. Today I saw a friend compiled a PHP function to get all the link functions in the specified URL page. Let’s take a look at it. Take a look.
The following code can obtain all links in the specified URL page, that is, the href attribute of all a tags:
- // Get the HTML code of the link
- $html = file_get_contents('http://www.111cn.net');
- $dom = new DOMDocument();
- @$dom->loadHTML($html);
- $xpath = new DOMXPath($dom);
- $hrefs = $xpath->evaluate('/html/body//a');
- for ($i = 0; $i < $hrefs->length; $i++) {
- $href = $hrefs->item($i);
- $url = $href->getAttribute('href');
- echo $url.'
';
- }
This code will get the href attribute of all a tags, but the href attribute value is not necessarily a link. We can filter it and only keep the link address starting with http:
- // Get the HTML code of the link
- $html = file_get_contents('http://www.111cn.net');
- $dom = new DOMDocument();
- @$dom->loadHTML($html);
- $xpath = new DOMXPath($dom);
- $hrefs = $xpath->evaluate('/html/body//a');
- for ($i = 0; $i < $hrefs->length; $i++) {
- $href = $hrefs->item($i);
- $url = $href->getAttribute('href');
-
- // Keep links starting with http
- if(substr($url, 0, 4) == 'http')
- echo $url.'
';
- }
fopen() function reads all the links in the specified web page and counts the number. This code is suitable for use in some places where the content of the web page needs to be collected. In this example, reading the Baidu homepage is used as an example to find out all the links in the Baidu homepage. Link address, the code has been tested and is fully usable:
- if(empty($url))$url = "http://www.baidu.com/";//The URL address of the link that needs to be collected
- $site=substr($url,0,strpos($url,"/",8));
- $base=substr($url,0,strrpos($url,"/")+1);//The directory where the file is located
- $fp = fopen($url, "r" );//Open the url address page
- while(!feof($fp))$contents.=fread($fp,1024);
- $pattern="|href=['"]?([^ '"]+)['" ]|U";
- preg_match_all($pattern,$contents, $regArr, PREG_SET_ORDER);//Use regular expressions to match all href=
- for($i=0;$i
- if(!eregi("://",$regArr[$i][1]))//Determine whether it is a relative path, that is, whether there is still ://
- if(substr($regArr[$i][1],0,1)=="/")//Is it the root directory of the site
- echo "link".($i+1).":".$site.$regArr[$i][1]."
";//Root directory
- else
- echo "link".($i+1).":".$base.$regArr[$i][1]."
";//Current directory
- else
- echo "link".($i+1).":".$regArr[$i][1]."
";//relative path
- }
- fclose($fp);
- ?>
form:http://www.uphtm.com/php/253.html
The above introduces PHP to get all the links in the specified URL page, including the content. I hope it will be helpful to friends who are interested in PHP tutorials.