Write a function that is necessary for collection, the URL completion function, which can also be called FormatUrl.
The purpose of writing this function is to develop a collection program. When collecting articles, you will often encounter that the path in the page is "relative path" or "absolute root path" and is not "absolute full path", so the URL cannot be collected.
Therefore, this function is needed to format the code and format all hyperlinks, so that the correct URL can be collected directly.
Popularization of path knowledge
Relative path: "../" "./" or add nothing in front
Absolute root path: /path/xxx.html
Absolute full path: http://www.xxx.com/path/xxx.html
Usage example:
Copy code The code is as follows:
$gethtm = '
Homepage a>Resolution';
echo formaturl($gethtm,$surl);
HomepageSolution ---------Demo Example------------
Original path code: http:/ /www.newnew.cn/newnewindex.aspx
Output demo code: http://www.maifp.com/aaa/test.php
The following is the function code
Copy code The code is as follows:
function formaturl($l1,$l2){
if (preg_match_all("/(< img[^>]+src="([^"]+)"[^>]*>)|(
]+href="([^"]+)"[ ^>]*>)|(
]+src='([^']+)'[^>]*>)|(]+ href='([^']+)'[^>]*>)/i",$l1,$regs)){
foreach($regs[0] as $num => $url ){
$l1 = str_replace($url,lIIIIl($url,$l2),$l1);
return $l1;
function lIIIIl ($l1,$l2){
if(preg_match("/(.*)(href|src)=(.+?)( |/>|>).*/i",$l1, $regs)){$I2 = $regs[3];}
$I1 = str_replace(chr(34),"",$I2);
$I1 = str_replace(chr(39),"",$I1);
}else{return $l1;}
$url_parsed = parse_url($l2);
$scheme = $ url_parsed["scheme"];if($scheme!=""){$scheme = $scheme."://";}
$host = $url_parsed["host"];
$l3 = $scheme.$host;
if(strlen($l3)==0){return $l1;}
$path = dirname($url_parsed["path"]);if($path[0] ==="\"){$path="";}
$pos = strpos($I1,"#");
if($pos>0) $I1 = substr($I1,0, $pos);
//Judge type
if(preg_match("/^(http|https|ftp):(//|\\)(([w/\+-~`@:%] )+.)+([w/\.=?+-~`@':!%#]|(&)|&)+/i",$I1)){return $l1; }//Start with http The url type should be skipped
elseif($I1[0]=="/"){$I1 = $l3.$I1;}//Absolute path
elseif(substr($I1,0,3 )=="../"){//Relative path
$I1 = substr($I1,strlen( $I1)-(strlen($I1)-3),strlen($I1)-3);
$path = dirname($path);
$I1 = $l3.$path."/".$I1;
elseif(substr($I1,0,2)=="./ "){
$I1 = $l3.$path.substr($I1,strlen($I1)-(strlen($I1)-1),strlen($I1)-1);
return $l1 ;
$I1 = $l3.$path."/".$I1;
return str_replace($I2,""$I1"",$l1);
The link below is the place to learn PHP regular expressions. Leave a link here to prevent it from being lost. . .
http://www.bkjia.com/PHPjc/325775.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/325775.htmlTechArticleWrite a function that is necessary for collection, the URL completion function, which can also be called FormatUrl. The purpose of writing this function is to develop a collection program. When collecting articles, you will often encounter the path in the page that is "phase...