To extract part of the data from Google search, I found that Google is very shielded from the data captured by software. In the past, forging USER-AGENT could capture the data, but now it does not work. Using packet capture data, we found that Google has determined cookies. When you do not have cookies, it will directly return 302 jumps, and there are dozens of 302 jumps in a row, and no data can be captured at all.
Therefore, when sending a search command, you need to extract and save the cookies first, and then use the saved cookie to send the search command again to capture the data normally. This is actually the same as the simulated login of the forum. First log in through POST, obtain the cookies and save them, and then use the cookies to access.
The PHP code is as follows:
Copy the code The code is as follows:
header('Content -Type: text/html; charset=utf-8');
$cookie_file = dirname(__FILE__).'/cookie.txt';
//$cookie_file = tempnam("tmp", "cookie");
//Get cookies first and save them
$url = "http://www.google.com.hk";
$ch = curl_init($url); //Initialization
curl_setopt($ch, CURLOPT_HEADER, 0); //Does not return the header part
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //Returns a string instead of directly outputting
curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie_file); //Storage cookies
curl_exec($ch);
curl_close($ch);
//Use the cookies saved above to visit again
$ url = "http://www.google.com.hk/search?oe=utf8&ie=utf8&source=uds&hl=zh-CN&q=qq";
$ch = curl_init($url);
curl_setopt($ ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file); //Use the cookies obtained above
$response = curl_exec($ ch);
curl_close($ch);
echo $response;
?>
http://www.bkjia.com/PHPjc/824998.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/824998.htmlTechArticleTo extract some data from Google search, I found that Google is very powerful in blocking the data captured by the software. I used to fake it. USER-AGENT can capture data, but now it cannot. Use the catch...