Table of Contents
回复讨论(解决方案)
Home Backend Development PHP Tutorial 求助php无法抓取网页,问了几个人都没解决

求助php无法抓取网页,问了几个人都没解决

Jun 23, 2016 pm 02:17 PM

本帖最后由 dz215136304 于 2013-06-11 11:35:47 编辑

url必须为以下代码中的url,经测试,在抓取时,如果q后面的参数带空格,他会自动把"&"转换成"&",从而造成数据无法抓取,在网页中直接输入网址是可以得到内容的,求解决方法
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz claiborne&page=1&showMode=list";echo Post($url);function Post($url, $post = null)//请求的网页{$context = array();	if (is_array($post))	{		ksort($post);				$context['http'] = array		(				'timeout'=>60,		'method' => 'POST',		'header'=>">Accept-language: en/r/n",		'content' => http_build_query($post, '', '&'),		);	}return file_get_contents($url, false, stream_context_create($context));}
Copy after login


错误提示:
Warning: file_get_contents(http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz claiborne&page=1&showMode=list) [<a href='function.file-get-contents'>function.file-get-contents</a>]: failed to open stream: HTTP request failed! HTTP/1.1 505 HTTP Version Not Supported in F:\wwwroot\getTaobao\test.php on line 25
Copy after login


回复讨论(解决方案)

你可以先看看html 字符实体


file_get_contents ? 将整个文件读入一个字符串


说明

string file_get_contents ( string $filename [, bool $use_include_path [, resource $context [, int $offset [, int $maxlen ]]]] )

和 file() 一样,只除了 file_get_contents() 把文件读入一个字符串。将在参数 offset 所指定的位置开始读取长度为 maxlen 的内容。如果失败,file_get_contents()将返回 FALSE。

file_get_contents()函数是用来将文件的内容读入到一个字符串中的首选方法。如果操作系统支持还会使用内存映射技术来增强性能。


Note: 如果要打开有特殊字符的 URL (比如说有空格),就需要使用 urlencode() 进行 URL 编码。



另外
'header'=>" >Accept-language: en /r/n"
红字部分是什么?
>是多余的,/r/n应为\r\n
header不正确的话,服务器端返回错误(505)就是正常的了

file_get_contents ? 将整个文件读入一个字符串


说明

string file_get_contents ( string $filename [, bool $use_include_path [, resource $context [, int $offset [, int $maxlen ]]]] )

和 file() 一样,只除了 file_get_contents() 把文件读入一个字符串。将在参数 offset 所指定的位置开始读取长度为 maxlen 的内容。如果失败,file_get_contents()将返回 FALSE。

file_get_contents()函数是用来将文件的内容读入到一个字符串中的首选方法。如果操作系统支持还会使用内存映射技术来增强性能。


Note: 如果要打开有特殊字符的 URL (比如说有空格),就需要使用 urlencode() 进行 URL 编码。



另外
'header'=>" >Accept-language: en /r/n"
红字部分是什么?
>是多余的,/r/n应为\r\n
header不正确的话,服务器端返回错误(505)就是正常的了

通过url编码后一样无法获得数据 ,代码如下

$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=lizclaiborne&page=1&showMode=list";echo Post(urlencode($url));function Post($url, $post = null)//请求的网页{$context = array();	if (is_array($post))	{		ksort($post);				$context['http'] = array		(				'timeout'=>60,		'method' => 'POST',		'header'=>"Accept-language: en\r\n",		'content' => http_build_query($post, '', '&'),		);	}return file_get_contents($url, false, stream_context_create($context));}
Copy after login

实际的错误是:HTTP/1.1 505 HTTP Version Not Supported

file_get_contents(str_replace(' ', '%20', $url));

现在可以了,刚才可能是他的服务器出现了问题

$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=lizclaiborne&page=1&showMode=list";echo file_get_contents($url);
Copy after login
{"result":"true","totalPage":"100","catmap":"","ppath":"","category":"","auctionTagFlag1":"","auctionTagFlag2":"","auctionTagFlag3":"","listItem":[
           {"name":"团购价美国真品liz claiborne丽资克莱本女款中款钱包 liz钱包" ,"img":"http://q.i02.wimg.taobao.com/bao/uploaded/i1/T18ZyyXfXgXXXc8SLa_122312.jpg_90x90.jpg","img2":"http://q.i04.wimg.taobao.com/bao/uploaded/i1/T18ZyyXfXgXXXc8SLa_122312.jpg","iswebp":"","url":"http://a.m.taobao.com/i2431550873.htm?rn=bwHGEi1-ZClPeKBbGc1lfJhm45-D1gLR8O-pug7&sid=8b9c27255c655b1e","previewUrl":"http://a.m.taobao.com/ajax/pre_view.do?itemId=2431550873&sid=8b9c27255c655b1e","favoriteUrl":"http://fav.m.taobao.com/favorite/to_collection.htm?itemNumId=2431550873&sid=8b9c27255c655b1e",
    "icon":["0" ],
    "price":"39.00","originalPrice":"39.00","freight":"10","area":"天津","act":"月售1","itemNumId":"2431550873","nick":"金缕衣_2007",
..........

嗯,粘错了数据
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz claiborne&page=1&showMode=list";
这个不行 HTTP/1.1 505 HTTP Version Not Supported

这样都行
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz +claiborne&page=1&showMode=list";
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz %20claiborne&page=1&showMode=list";

他的服务器不知做了什么设置,不接受未经 url 编码的数据


嗯,粘错了数据
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz claiborne&page=1&showMode=list";
这个不行 HTTP/1.1 505 HTTP Version Not Supported

这样都行
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz +claiborne&page=1&showMode=list";
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz %20claiborne&page=1&showMode=list";

他的服务器不知做了什么设置,不接受未经 url 编码的数据

服务器可以接受“未经 url 编码的数据”吗?
怎么我理解的是服务器只能接受经过url编码的数据呢,
如果我们直接把带空格的地址在浏览器打开,
那浏览器已经自动把url编码了,
所以打开正常,
但是php并不是浏览器,
所以它不会自动做这些事情,
需要手动进行编码,
难道不是这样的吗?




嗯,粘错了数据
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz claiborne&page=1&showMode=list";
这个不行 HTTP/1.1 505 HTTP Version Not Supported

这样都行
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz +claiborne&page=1&showMode=list";
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz %20claiborne&page=1&showMode=list";

他的服务器不知做了什么设置,不接受未经 url 编码的数据

服务器可以接受“未经 url 编码的数据”吗?
怎么我理解的是服务器只能接受经过url编码的数据呢,
如果我们直接把带空格的地址在浏览器打开,
那浏览器已经自动把url编码了,
所以打开正常,
但是php并不是浏览器,
所以它不会自动做这些事情,
需要手动进行编码,
难道不是这样的吗?


空格符(\x20)是url合法字符,怎么处理视乎服务器
你做过http socket的话就知道了,header发送带空格的url一般也是可接受的



嗯,粘错了数据
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz claiborne&page=1&showMode=list";
这个不行 HTTP/1.1 505 HTTP Version Not Supported

这样都行
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz +claiborne&page=1&showMode=list";
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz %20claiborne&page=1&showMode=list";

他的服务器不知做了什么设置,不接受未经 url 编码的数据

服务器可以接受“未经 url 编码的数据”吗?
怎么我理解的是服务器只能接受经过url编码的数据呢,
如果我们直接把带空格的地址在浏览器打开,
那浏览器已经自动把url编码了,
所以打开正常,
但是php并不是浏览器,
所以它不会自动做这些事情,
需要手动进行编码,
难道不是这样的吗?


空格符(\x20)是url合法字符,怎么处理视乎服务器
你做过http socket的话就知道了,header发送带空格的url一般也是可接受的


那就是说查询字符串不管是什么字符,
服务器都可以全部原样接收到是吗?


正确的写法是:
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=". urlencode('liz claiborne') . "&page=1&showMode=list";




嗯,粘错了数据
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz claiborne&page=1&showMode=list";
这个不行 HTTP/1.1 505 HTTP Version Not Supported

这样都行
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz +claiborne&page=1&showMode=list";
$url="http://110.75.65.8/search_turn_page_iphone.htm?sort=&q=liz %20claiborne&page=1&showMode=list";

他的服务器不知做了什么设置,不接受未经 url 编码的数据

服务器可以接受“未经 url 编码的数据”吗?
怎么我理解的是服务器只能接受经过url编码的数据呢,
如果我们直接把带空格的地址在浏览器打开,
那浏览器已经自动把url编码了,
所以打开正常,
但是php并不是浏览器,
所以它不会自动做这些事情,
需要手动进行编码,
难道不是这样的吗?


空格符(\x20)是url合法字符,怎么处理视乎服务器
你做过http socket的话就知道了,header发送带空格的url一般也是可接受的


那就是说查询字符串不管是什么字符,
服务器都可以全部原样接收到是吗?



换行符和/符,你觉得如何,当然不是任何字符

这个问题我以前遇到过,把“&”单独拿出来就可以比方说http://www.123.com?id=123&num=123;
写成$url='http://www.123.com?id=123'.“&”.'num=123';这样编译器会把它当字符串算不给转换。

转码即可 urlencode()

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

How does session hijacking work and how can you mitigate it in PHP? How does session hijacking work and how can you mitigate it in PHP? Apr 06, 2025 am 12:02 AM

Session hijacking can be achieved through the following steps: 1. Obtain the session ID, 2. Use the session ID, 3. Keep the session active. The methods to prevent session hijacking in PHP include: 1. Use the session_regenerate_id() function to regenerate the session ID, 2. Store session data through the database, 3. Ensure that all session data is transmitted through HTTPS.

Describe the SOLID principles and how they apply to PHP development. Describe the SOLID principles and how they apply to PHP development. Apr 03, 2025 am 12:04 AM

The application of SOLID principle in PHP development includes: 1. Single responsibility principle (SRP): Each class is responsible for only one function. 2. Open and close principle (OCP): Changes are achieved through extension rather than modification. 3. Lisch's Substitution Principle (LSP): Subclasses can replace base classes without affecting program accuracy. 4. Interface isolation principle (ISP): Use fine-grained interfaces to avoid dependencies and unused methods. 5. Dependency inversion principle (DIP): High and low-level modules rely on abstraction and are implemented through dependency injection.

How to debug CLI mode in PHPStorm? How to debug CLI mode in PHPStorm? Apr 01, 2025 pm 02:57 PM

How to debug CLI mode in PHPStorm? When developing with PHPStorm, sometimes we need to debug PHP in command line interface (CLI) mode...

How to automatically set permissions of unixsocket after system restart? How to automatically set permissions of unixsocket after system restart? Mar 31, 2025 pm 11:54 PM

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

How to send a POST request containing JSON data using PHP's cURL library? How to send a POST request containing JSON data using PHP's cURL library? Apr 01, 2025 pm 03:12 PM

Sending JSON data using PHP's cURL library In PHP development, it is often necessary to interact with external APIs. One of the common ways is to use cURL library to send POST�...

See all articles