PHP thief program example code_PHP tutorial-PHP Tutorial-php.cn

The thief program actually uses a specific function in php to collect the content of other people's websites, and then saves the content we want to our own local database through regular analysis. Now I will introduce the implementation method of the php thief program. If necessary Friends can refer to it.

The file_get_contents function is the key in the data collection process below. Let’s take a look at the file_get_contents function syntax

string file_get_contents ( string $filename [, bool $use_include_path = false [, resource $context [, int $offset = -1 [, int $maxlen ]]]] )
Same as file(), except file_get_contents() reads the file into a string. Contents of length maxlen will be read starting at the position specified by the offset parameter. On failure, file_get_contents() will return FALSE.

The file_get_contents() function is the preferred method for reading the contents of a file into a string. If the operating system supports it, memory mapping technology will also be used to enhance performance.

Example

The code is as follows

Copy code

代码如下	复制代码
$homepage = file_get_contents('http://www.hzhuti.com/'); echo $homepage; ?>

$homepage = file_get_contents('http://www.hzhuti.com/');

echo $homepage;

代码如下

复制代码

//采集网页
function pick($url,$ft,$th)
{
$c=fetch_urlpage_contents($url);
foreach($ft as $key => $value)
{
$rs[$key]=fetch_match_contents($value["begin"],$value["end"],$c);
if(is_array($th[$key]))
{ foreach($th[$key] as $old => $new)
{
$rs[$key]=str_replace($old,$new,$rs[$key]);
}
}
}
return $rs;
}

$url="http://www.bkjia.com"; //要采集的地址
$ft["title"]["begin"]=""; //截取的开始点<br>$ft["title"]["end"]=""; //截取的结束点
$th["title"]["中山"]="广东"; //截取部分的替换

$ft["body"]["begin"]=""; //截取的开始点
$ft["body"]["end"]=""; //截取的结束点
$th["body"]["中山"]="广东"; //截取部分的替换

$rs=pick($url,$ft,$th); //开始采集

echo $rs["title"];
echo $rs["body"]; //输出
?>

In this way, $homepage is the content of our collection network saved. Okay, having said that, let’s get started. Example The code is as follows

Copy code

<🎜>function fetch_urlpage_contents($url){<🎜>$c=file_get_contents($url);<🎜>return $c;<🎜>}<🎜>//Get matching content<🎜>function fetch_match_contents($begin,$end,$c)<🎜>{<🎜>$begin=change_match_string($begin);<🎜>$ end=change_match_string($end);<🎜>$p = "{$begin}(.*){$end}";<🎜>if(eregi($p,$c,$rs))<🎜>{ <🎜>return $rs[1];}<🎜>else { return "";}<🎜>}//Escape regular expression string<🎜>function change_match_string($str){<🎜>//Note , the following is just a simple escape<🎜>//$old=array("/","$");<🎜>//$new=array("/","$");<🎜>$str= str_replace($old,$new,$str);<🎜>return $str;<🎜>}<🎜><🎜>//Collect web pages<🎜>function pick($url,$ft,$th)<🎜 >{<🎜>$c=fetch_urlpage_contents($url);<🎜>foreach($ft as $key => $value){$rs[$key]=fetch_match_contents($value[" begin"],$value["end"],$c);if(is_array($th[$key])){ foreach($th[$key] as $old => $ new){$rs[$key]=str_replace($old,$new,$rs[$key]);}}}return $rs ;}$url="http://www.bkjia.com"; //The address to be collected$ft["title"]["begin"]="< ;title>"; //Start point of interception$ft["title"]["end"]=""; //End point of interception$th["title" ]["Zhongshan"]="Guangdong"; //Replacement of the intercepted part$ft["body"]["begin"]=""; //Start point of interception$ft["body"]["end"]=""; //End point of interception$th["body"]["Zhongshan"]="Guangdong"; / /Replacement of the intercepted part$rs=pick($url,$ft,$th); //Start collectionecho $rs["title"];echo $ rs["body"]; //Output?>

The following code is modified from the previous page and is specifically used to extract all hyperlinks, emails or other specific content on web pages

The code is as follows

代码如下

复制代码

$url="http://www.bkjia.com"; //要采集的地址
$ft["a"]["begin"]='
$ft["a"]["end"]='>'; //截取的结束点

$rs=pick($url,$ft,$th); //开始采集

print_r($rs["a"]);

Copy code

//Collect web pages
function pick($url,$ft,$ th)
{
$c=fetch_urlpage_contents($url);
foreach($ft as $key => $value)
{
$rs[$ key]=fetch_match_contents($value["begin"],$value["end"],$c);
if(is_array($th[$key]))
{ foreach($th [$key] as $old => $new)
{
$rs[$key]=str_replace($old,$new,$rs[$key]);
}
}
}
return $rs;
}

$url="http://www.bkjia.com"; //The address to be collected
$ft["a"]["begin"]='
$ft["a"]["end" ]='>'; //End point of interception

$rs=pick($url,$ft,$th); //Start collection

代码如下

复制代码

function GetSources($Url,$User_Agent='',$Referer_Url='') //抓取某个指定的页面
{
//$Url 需要抓取的页面地址
//$User_Agent 需要返回的user_agent信息如“baiduspider”或“googlebot”
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $Url);
curl_setopt ($ch, CURLOPT_USERAGENT, $User_Agent);
curl_setopt ($ch, CURLOPT_REFERER, $Referer_Url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$MySources = curl_exec ($ch);
curl_close($ch);
return $MySources;
}
$Url = "http://www.bkjia.com"; //要获取内容的也没
$User_Agent = "baiduspider+(+http://www.baidu.com/search/spider.htm)";
$Referer_Url = 'http://www.jb51.net/';
echo GetSources($Url,$User_Agent,$Referer_Url);
?>

print_r($rs[" a"]);