The content that needs to be extracted is as follows:
Similar to the automatic extraction of URLs as hyperlink addresses in Weibo. That is, the content is extracted, added with an A tag, and converted into a real hyperlink. I searched online for a long time and didn't find a practical solution. Most of them simply extract the URL (the addresses in the A tag and IMG tag are also extracted and replaced), which cannot meet the above needs. There is no method found in regular expressions that can filter out the A tag during extraction. So I changed my thinking and decided to "save the country through curves". That is, first replace all A tags and IMG tags with a unified tag, then extract the URL address and replace it with a hyperlink, and finally restore and replace the unified tag with the previous A tag and IMG tag.
//Extract and replace all IMG tags (unified tag <{img}>)
preg_match_all('/]+>/im',$content,$imgList );
$imgList=$imgList[0];
$str=preg_replace('/]+>/im','<{img}>',$ str);
//Extract and replace the standard URL address
$str=preg_replace('(((f|ht){1}tp://)[-a-zA-Z0-9 @:%_/+.~#?&//=]+)','http://baidu.comThis is the first A tag,
< a href="http://blog.baidu.com">Growth Footprints - Focus on Internet DevelopmentThis is the second A tag.
http://www.jb51.net This is the first URL address that needs to be extracted.
http://blog.baidu.com This is the second URL address that needs to be extracted. , this is an IMG tag';
echo linkAdd($content);
The content returned is:
Copy code
The code is as follows:
http: //baidu.comThis is the first A tag, Growth footprints - focusing on Internet developmentThis is the first Two A tags. http://www.jb51.netThis is the first URL address that needs to be extracted, < a href="http://blog.baidu.com" target="_blank">http://blog.baidu.comThis is the second URL address that needs to be extracted. , this is an IMG tag
function replace_URLtolink($text) {
// grab anything that looks like a URL...
$urls = array();
// build the patterns
$scheme = '(https?://|ftps?://)?';
$www = '([w]+.)';
$ip = '(d{1,3}.d{1,3}.d{1,3}.d{1,3})';
$name = '([w0-9]+)';
$tld = '(w{2,4})';
$port = '(:[0-9]+)?';
$the_rest = '(/?([w#!:.?+=&%@!-/]+))?';
$pattern = $scheme.'('.$ip.$port.'|'.$www.$name.$tld.$port.')'.$the_rest;
$pattern = '/'.$pattern.'/is';
// Get the URLs
$c = preg_match_all($pattern, $text, $m);
if ($c) {
$urls = $m[0];
}
// Replace all the URLs
if (! empty($urls)) {
foreach ($urls as $url) {
$pos = strpos('http://', $url);
if (($pos && $pos != 0) || !$pos) {
$fullurl = 'http://'.$url;
} else {
$fullurl = $url;
}
$link = ''.$url.'';
$text = str_replace($url, $link, $text);
}
}
return $text;
}