To capture remote content, I have been using the file_get_content function before. In fact, I have known about the existence of such a good thing as curl, but after taking a look, I felt that it is quite complicated to use. It is not as simple as file_get_content, and the demand is not big. So I didn't learn to use curl.
Until recently, when I was trying to create a web thief program, I discovered that file_get_content could no longer meet the needs. I think that when reading remote content, except that file_get_content is more convenient to use than curl, it is not as good as curl.
Some comparisons between curl and file_get_content in php
Main differences:
I discovered after studying that curl supports many protocols, including FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP, that is to say, it It can do many things that file_get_content cannot do. Curl can achieve remote acquisition and collection of content in PHP; implement FTP upload and download of PHP web version; implement simulated login; implement interface docking (API), data transmission; implement simulated cookies; download file breakpoint resume transfer, etc., the function is very powerful .
After understanding some basic uses of curl, I found that it is not difficult. It is just a little difficult to remember some of the setting parameters, but we can just remember a few commonly used ones.
Enable curl:
Because PHP does not support the curl function by default, so if you want to use curl, you first need to enable this function in php.ini, that is, remove the semicolon in front of ;extension= php_curl.dll, and then save Then restart apache/iis and it will be fine.
Basic syntax:
$my_curl = curl_init(); //初始化一个curl对象 curl_setopt($my_curl, CURLOPT_URL, "http://www.jb51.net"); //设置你需要抓取的URL curl_setopt($my_curl,CURLOPT_RETURNTRANSFER,1); //设置是将结果保存到字符串中还是输出到屏幕上,1表示将结果保存到字符串 $str = curl_exec($curl); //执行请求 echo $str; //输出抓取的结果 curl_close($curl); //关闭url请求
Recently, I need to get music data from other people’s websites. I used the file_get_contents function, but I always encountered the problem of failure to obtain it. Although I set the timeout according to the examples in the manual, it didn’t work most of the time:
$config['context'] = stream_context_create(array('http' => array('method' => "GET",
'timeout' => 5//This timeout is unstable and often does not work
)
));
At this time, take a look at the server connection Pool, I will find a bunch of similar errors, giving me a huge headache:
file_get_contents(http://***): failed to open stream...
Now I use the curl library and wrote a function replacement:
function curl_file_get_contents ($durl){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $durl);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_USERAGENT, _USERAGENT_);
curl_setopt( $ch, CURLOPT_REFERER,_REFERER_);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$r = curl_exec($ch);
curl_close($ch);
return $r;
}
So, except for the real Other than network issues, no other problems occurred.
This is a test about curl and file_get_contents done by others:
The number of seconds it takes for file_get_contents to crawl google.com:
2.31319094
2.30374217
2.21512604
3.30553889
2.30124092
curl usage time:
0.68719101
0.64675593
0.64326
0.81983113
0.63956594
A big gap? Haha, from my experience, these two tools are not only different in speed, but also in stability.
It is recommended that friends who have high requirements for the stability of network data capture use the curl_file_get_contents function above. It is not only stable and fast, but also can fake the browser to spoof the target address!
Method 1: Use file_get_contents to get the content in the get method
<?php $url='http://www.domain.com/'; $html = file_get_contents($url); echo $html; ?>
Method 2: Use fopen to open the url and get the content in the get method
<?php $fp = fopen($url, 'r'); stream_get_meta_data($fp); while(!feof($fp)) { $result .= fgets($fp, 1024); } echo "url body: $result"; fclose($fp); ?>
Method 3: Use the file_get_contents function to get the url in the post method
<?php $data = array ('foo' => 'bar'); $data = http_build_query($data); $opts = array ( 'http' => array ( 'method' => 'POST', 'header'=> "Content-type: application/x-www-form-urlencodedrn" . "Content-Length: " . strlen($data) . "rn", 'content' => $data ) ); $context = stream_context_create($opts); $html = file_get_contents('http://localhost/e/admin/test.html', false, $context); echo $html; ?>
Method 4: Use The fsockopen function opens the URL and obtains the complete data in the get method, including header and body
<?php function get_url ($url,$cookie=false) { $url = parse_url($url); $query = $url[path]."?".$url[query]; echo "Query:".$query; $fp = fsockopen( $url[host], $url[port]?$url[port]:80 , $errno, $errstr, 30); if (!$fp) { return false; } else { $request = "GET $query HTTP/1.1rn"; $request .= "Host: $url[host]rn"; $request .= "Connection: Closern"; if($cookie) $request.="Cookie: $cookien"; $request.="rn"; fwrite($fp,$request); while()) { $result .= @fgets($fp, 1024); } fclose($fp); return $result; } } //获取url的html部分,去掉header function GetUrlHTML($url,$cookie=false) { $rowdata = get_url($url,$cookie); if($rowdata) { $body= stristr($rowdata,"rnrn"); $body=substr($body,4,strlen($body)); return $body; } return false; } ?>
Method 5: Use the fsockopen function to open the URL and obtains the complete data in the POST method, including the header and body
<?php function HTTP_Post($URL,$data,$cookie, $referrer="") { // parsing the given URL $URL_Info=parse_url($URL); // Building referrer if($referrer=="") // if not given use this script as referrer $referrer="111″; // making string from $data foreach($data as $key=>$value) $values[]="$key=".urlencode($value); $data_string=implode("&",$values); // Find out which port is needed – if not given use standard (=80) if(!isset($URL_Info["port"])) $URL_Info["port"]=80; // building POST-request: $request.="POST ".$URL_Info["path"]." HTTP/1.1n"; $request.="Host: ".$URL_Info["host"]."n"; $request.="Referer: $referern"; $request.="Content-type: application/x-www-form-urlencodedn"; $request.="Content-length: ".strlen($data_string)."n"; $request.="Connection: closen"; $request.="Cookie: $cookien"; $request.="n"; $request.=$data_string."n"; $fp = fsockopen($URL_Info["host"],$URL_Info["port"]); fputs($fp, $request); while(!feof($fp)) { $result .= fgets($fp, 1024); } fclose($fp); return $result; } ?>
Method 6: Use the curl library , before using the curl library, you may need to check whether the curl extension has been turned on in php.ini
<?php $ch = curl_init(); $timeout = 5; curl_setopt ($ch, CURLOPT_URL, 'http://www.domain.com/'); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $file_contents = curl_exec($ch); curl_close($ch); echo $file_contents; ?>
The three functions of curl, fsockopen, and file_get_contents in php can all realize the collection of simulated speeches. What is the difference between the three, or is there anything to pay attention to?
Zhao Yongbin:
Sometimes when file_get_contents() is used to call external files, it is easy to time out and report an error. Just change it to curl. The specific reason is not clear
curl is more efficient than file_get_contents() and fsockopen(). The reason is that CURL will automatically cache DNS information (the highlight is for me to test personally)
Fan Jiapeng:
file_get_contents curl fsockopen
is a selective operation under the current requested environment, and there is no generalization:
Judging from the KBI application developed by our company:
Just started using: file_get_contents
Later used: fsockopen
Finally, until now, using: curl
(远程)我个人理解到的表述如下(不对请指出,不到位请补充)
file_get_contents 需要php.ini里开启allow_url_fopen,请求http时,使用的是http_fopen_wrapper,不会keeplive.curl是可以的。
file_get_contents()单个执行效率高,返回没有头的信息。
这个是读取一般文件的时候并没有什么问题,但是在读取远程问题的时候就会出现问题。
如果是要打一个持续连接,多次请求多个页面。那么file_get_contents和fopen就会出问题。
取得的内容也可能会不对。所以做一些类似采集工作的时候,肯定就有问题了。
sock较底层,配置麻烦,不易操作。 返回完整信息。
潘少宁-腾讯:
file_get_contents 虽然可以获得某URL的内容,但不能post get啊。
curl 则可以post和get啊。还可以获得head信息
而socket则更底层。可以设置基于UDP或是TCP协议去交互
file_get_contents 和 curl 能干的,socket都能干。
socket能干的,curl 就不一定能干了
file_get_contents 更多的时候 只是去拉取数据。效率比较高 也比较简单。
赵的情况这个我也遇到过,我通过CURL设置host 就OK了。 这和网络环境有关系
<?php /** * Socket版本 * 使用方法: * $post_string = "app=socket&version=beta"; * request_by_socket('jb51.net','/restServer.php',$post_string); */ function request_by_socket($remote_server,$remote_path,$post_string,$port = 80,$timeout = 30){ $socket = fsockopen($remote_server,$port,$errno,$errstr,$timeout); if (!$socket) die("$errstr($errno)"); fwrite($socket,"POST $remote_path HTTP/1.0"); fwrite($socket,"User-Agent: Socket Example"); fwrite($socket,"HOST: $remote_server"); fwrite($socket,"Content-type: application/x-www-form-urlencoded"); fwrite($socket,"Content-length: ".strlen($post_string)+8.""); fwrite($socket,"Accept:*/*"); fwrite($socket,""); fwrite($socket,"mypost=$post_string"); fwrite($socket,""); $header = ""; while ($str = trim(fgets($socket,4096))) { $header.=$str; } $data = ""; while (!feof($socket)) { $data .= fgets($socket,4096); } return $data; } /** * Curl版本 * 使用方法: * $post_string = "app=request&version=beta"; * request_by_curl('http://jb51.net/restServer.php',$post_string); */ function request_by_curl($remote_server,$post_string){ $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$remote_server); curl_setopt($ch,CURLOPT_POSTFIELDS,'mypost='.$post_string); curl_setopt($ch,CURLOPT_RETURNTRANSFER,true); curl_setopt($ch,CURLOPT_USERAGENT,"Jimmy's CURL Example beta"); $data = curl_exec($ch); curl_close($ch); return $data; } /** * 其它版本 * 使用方法: * $post_string = "app=request&version=beta"; * request_by_other('http://jb51.net/restServer.php',$post_string); */ function request_by_other($remote_server,$post_string){ $context = array( 'http'=>array( 'method'=>'POST', 'header'=>'Content-type: application/x-www-form-urlencoded'."". 'User-Agent : Jimmy's POST Example beta'."". 'Content-length: '.strlen($post_string)+8, 'content'=>'mypost='.$post_string) ); $stream_context = stream_context_create($context); $data = file_get_contents($remote_server,FALSE,$stream_context); return $data; } ?>
更多php中使用Curl、socket、file_get_contents三种方法POST提交数据相关文章请关注PHP中文网!