如何采集防采集的网站

WBOY
Release: 2016-06-23 14:02:59
Original
1930 people have browsed it

我想用php采集一个网站的数据,但是无法获取该网站的数据。网址如下:
http://www.alldatasheet.com/view.jsp?Searchword=78HC
希望您能试一下,只要能返回数据就行了。我试了很久不能成功。


回复讨论(解决方案)

$header = array ( "GET /view.jsp?Searchword=78HC HTTP/1.1","Host: www.alldatasheet.com","Connection: keep-alive","Cache-Control: max-age=0","Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22","Accept-Encoding: gzip,deflate,sdch","Accept-Language: en-US,zh-CN;q=0.8,zh;q=0.6","Accept-Charset: UTF-8,*;q=0.5","Cookie: JSESSIONID=BD1418BC3F4CA9084F0C022A98687A09; track_id=117.25.173.111363310326444; seekstr=*78H*..; seekshot=78H..1..75..8..112; __utma=191189370.2036196682.1363308553.1363308553.1363308553.1; __utmb=191189370.3.10.1363308553; __utmc=191189370; __utmz=191189370.1363308553.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); arp_scroll_position=900"); // 初始化一个 cURL 对象$curl = curl_init();  // 设置你需要抓取的URLcurl_setopt($curl, CURLOPT_URL, 'http://www.alldatasheet.com/view.jsp?Searchword=78HC');curl_setopt($curl, CURLOPT_HTTPHEADER, $header); //设置header  // 设置header显示方式curl_setopt($curl, CURLOPT_HEADER, 0); // 设置cURL 参数,要求结果保存到字符串中还是输出到屏幕上。curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); // 运行cURL,请求网页$data = curl_exec($curl); // 关闭URL请求curl_close($curl); // 显示获得的数据var_dump($data);
Copy after login

只要是浏览器能访问的页面,应该都能采集的。
关键是cookie。

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template