http://fitness.39.net/food用file_get_contents为什么不能抓取?
直接echo file_get_contents('http://fitness.39.net/food/');
显示:
<code>Warning: file_get_contents(http://fitness.39.net/food/) [function.file-get-contents]: failed to open stream: HTTP request failed! </code>
怀疑是服务器验证了的浏览器UA,于是在php.ini中设置:
<code>allow_url_fopen =on user_agent=”Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)” </code>
重启apache,然后还是成功的失败了,依旧:
<code>Warning: file_get_contents(http://fitness.39.net/food/) [function.file-get-contents]: failed to open stream: HTTP request failed! </code>
求高手解答
回复内容:
直接echo file_get_contents('http://fitness.39.net/food/');
显示:
<code>Warning: file_get_contents(http://fitness.39.net/food/) [function.file-get-contents]: failed to open stream: HTTP request failed! </code>
怀疑是服务器验证了的浏览器UA,于是在php.ini中设置:
<code>allow_url_fopen =on user_agent=”Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)” </code>
重启apache,然后还是成功的失败了,依旧:
<code>Warning: file_get_contents(http://fitness.39.net/food/) [function.file-get-contents]: failed to open stream: HTTP request failed! </code>
求高手解答
问题找到了。事先说明,我是用 Node.js
来测试的。
初试
首先我用了下面的代码:
var spidex = require("spidex"); spidex.get("http://fitness.39.net/food/", function(html, status, respHeader) { console.log(html); }, "utf8").on("error", function(err) { console.log(err.message); });
传回来的是访问失败,连接错误。
假设
然后我用 Chrome
来查看我们正常访问时的一些 header
逐个去试。
var spidex = require("spidex"); var headers = { "connection" : "keep-alive" }; spidex.get("http://fitness.39.net/food/", function(html, status, respHeader) { console.log(html); }, headers, "utf8").on("error", function(err) { console.log(err.message); });
还是连接错误——直到我添加上了 accept
时:
var spidex = require("spidex"); var headers = { "connection" : "keep-alive", "accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" }; spidex.get("http://fitness.39.net/food/", function(html, status, respHeader) { console.log(html); }, headers, "utf8").on("error", function(err) { console.log(err.message); });
结果出来了。
结论
目测是服务端做了对 accept
什么的的验证吧,总之在请求头上面添加一个 accept
字段,并且值设置为 text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
即可。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



In this chapter, we will understand the Environment Variables, General Configuration, Database Configuration and Email Configuration in CakePHP.

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

To work with date and time in cakephp4, we are going to make use of the available FrozenTime class.

Working with database in CakePHP is very easy. We will understand the CRUD (Create, Read, Update, Delete) operations in this chapter.

To work on file upload we are going to use the form helper. Here, is an example for file upload.

In this chapter, we are going to learn the following topics related to routing ?

CakePHP is an open-source framework for PHP. It is intended to make developing, deploying and maintaining applications much easier. CakePHP is based on a MVC-like architecture that is both powerful and easy to grasp. Models, Views, and Controllers gu

Validator can be created by adding the following two lines in the controller.
