I have worked on many products that capture content from other websites, and I am accustomed to using the convenient and fast file_get_contents function, but I always encounter the problem of failure to obtain it. Even though I set the timeout according to the examples in the manual, it does not work most of the time:
$config['context'] = stream_context_create(array(‘http’ =< array(‘method’ =< “GET”, ’timeout’ =< 5//这个超时时间不稳定,经常不奏效 ) ));
At this time, if you look at the connection pool of the server, you will find a bunch of similar errors, which will give you a headache:
file_get_contents(http://***): failed to open stream...
As a last resort, I installed curl Library, I wrote a function replacement:
function curl_file_get_contents($durl){ $ch=curl_init(); curl_setopt($ch, CURLOPT_URL,$durl); curl_setopt($ch, CURLOPT_TIMEOUT,5); curl_setopt($ch, CURLOPT_USERAGENT, _USERAGENT_); curl_setopt($ch, CURLOPT_REFERER,_REFERER_); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); $r=curl_exec($ch); curl_close($ch); return $r; }
In this way, except for real network problems, no problems occurred.
This is a test about curl and file_get_contents done by others:
The number of seconds it takes for file_get_contents to crawl google.com:
2.31319094
2.30374217
2.21512604
3.30553889
2.30124092
Time used by curl:
0.68719101
0.64675593
0.64326
0.81983113
0.63956594
The gap is big, right? Haha, from my experience, these two tools are not only different in speed, but also in stability. It is recommended that friends who have high requirements for the stability of network data capture use the curl_file_get_contents function above. It is not only stable and fast, but also can fake the browser to spoof the target address!
Special attention: Different PHP versions may have different test results. Under PHP5.2, the file_get_contents function is particularly inefficient and prone to occupying too much CPU. It is recommended to upgrade to PHP5.3. After testing, it works in PHP5.3 There is no such problem