Table of Contents
回复内容:
Home Backend Development PHP Tutorial php curl抓取不到页面及来路问题?

php curl抓取不到页面及来路问题?

Jun 17, 2016 am 08:31 AM
curl url

$url = "mp.weixinbridge.com/mp/";
$ch = curl_init();
$timeout = 1;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;
相对路径的图片不能显示,如何使相对路径的图片正常显示?

回复内容:

谢邀,哥们,抓不到数据是因为:
1.你没有写header
2.没有写cookie,
3.没有针对https的url特殊设置
所以没有抓到数据,好好研究我写的这个代码,这个是可以抓到数据的。
要是帮到了你,给哥点个赞,支持下。
<?php

	$url = "https://www.zhihu.com/";

	$ch = curl_init();
    // 设置浏览器的特定header
    curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        "Host: www.zhihu.com",
        "Connection: keep-alive",
        "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Upgrade-Insecure-Requests: 1",
        "DNT:1",
        "Accept-Language: zh-CN,zh;q=0.8,en-GB;q=0.6,en;q=0.4,en-US;q=0.2",
        'Cookie:_za=4540d427-eee1-435a-a533-66ecd8676d7d; __utma=51854390.3169871.1440319332.1441339521.1442067491.5; __utmz=51854390.1442067491.5.5.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utmv=51854390.100-1|2=registration_date=20140525=1^3=entry_date=20140525=1; q_c1=efa8c4ccdba04f63a0ba88845f485836|1451394239000|1440047640000; _xsrf=20c250b28098f92459cac05a3944d48d; cap_id="ZWQ5OGIzN2JiZWNmNGRlNGE3YTE1MTE0YTA5YjY1NjE=|1451394239|0efd13fc965c43c0fb6a7a2523b5dac4d1dac7e3"; z_c0="QUFCQXRLa3ZBQUFYQUFBQVlRSlZUY29ScWxZN0k3T1BHaFdqb1JNVlVZekNnZ0trU0xXdEdnPT0=|1451394250|02ed77acc81edbf2340fd0ce1b13618862b3674e"; unlock_ticket="QUFCQXRLa3ZBQUFYQUFBQVlRSlZUZEtMZ2xiM21FNDRmdzdsX1NnOVdieUp3M1VtY0RsaUVBPT0=|1451394250|8cf44cefb523b2973eca01f0918ef97fc03a4994"',
		
		));
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0');
    // 在HTTP请求头中"Referer: "的内容。
    curl_setopt($ch, CURLOPT_REFERER,"https://www.baidu.com/s?word=%E7%9F%A5%E4%B9%8E&tn=sitehao123&ie=utf-8&ssl_sample=normal&f=3&rsp=0");
    curl_setopt($ch, CURLOPT_ENCODING, "gzip, deflate, sdch");
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_TIMEOUT,120);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//302redirect
    // 针对https的设置
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
    $html = curl_exec($ch);
    curl_close($ch);
    if($html === false) {
        echo 'Curl error: ' . curl_error($ch) . "<br>\n\r";
    } else {
		echo $html;
	}
Copy after login
我在暑假的时候爬过知乎,而且就是用的php+curl。知乎是有反爬虫机制的,你要尽量伪装成浏览器,包括header、useragent、cookie等等都设成浏览器上的一样,至于这些在哪可以看到请善用chrome的F12控制台。光伪装成浏览器是不够的,因为知乎有的页面是gzip加密的哦,所以你还要做好gzip解密的措施。如果知乎觉得你的行为可疑,知乎还会不定频率的给你返回空白页面,所以你还要做好数据验证的措施。总的来说爬知乎是不难的,但要稳定可靠的一口气爬完整个知乎还是很困难的。 curl配置增加cookie信息和header头部试试看,有些网站防采集需要尽可能的模拟。
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to realize the mutual conversion between CURL and python requests in python How to realize the mutual conversion between CURL and python requests in python May 03, 2023 pm 12:49 PM

Both curl and Pythonrequests are powerful tools for sending HTTP requests. While curl is a command-line tool that allows you to send requests directly from the terminal, Python's requests library provides a more programmatic way to send requests from Python code. The basic syntax for converting curl to Pythonrequestscurl command is as follows: curl[OPTIONS]URL When converting curl command to Python request, we need to convert the options and URL into Python code. Here is an example curlPOST command: curl-XPOST https://example.com/api

Tutorial on updating curl version under Linux! Tutorial on updating curl version under Linux! Mar 07, 2024 am 08:30 AM

To update the curl version under Linux, you can follow the steps below: Check the current curl version: First, you need to determine the curl version installed in the current system. Open a terminal and execute the following command: curl --version This command will display the current curl version information. Confirm available curl version: Before updating curl, you need to confirm the latest version available. You can visit curl's official website (curl.haxx.se) or related software sources to find the latest version of curl. Download the curl source code: Using curl or a browser, download the source code file for the curl version of your choice (usually .tar.gz or .tar.bz2

PHP function introduction—get_headers(): Get the response header information of the URL PHP function introduction—get_headers(): Get the response header information of the URL Jul 25, 2023 am 09:05 AM

PHP function introduction—get_headers(): Overview of obtaining the response header information of the URL: In PHP development, we often need to obtain the response header information of the web page or remote resource. The PHP function get_headers() can easily obtain the response header information of the target URL and return it in the form of an array. This article will introduce the usage of get_headers() function and provide some related code examples. Usage of get_headers() function: get_header

PHP8.1 released: Introducing curl for concurrent processing of multiple requests PHP8.1 released: Introducing curl for concurrent processing of multiple requests Jul 08, 2023 pm 09:13 PM

PHP8.1 released: Introducing curl for concurrent processing of multiple requests. Recently, PHP officially released the latest version of PHP8.1, which introduced an important feature: curl for concurrent processing of multiple requests. This new feature provides developers with a more efficient and flexible way to handle multiple HTTP requests, greatly improving performance and user experience. In previous versions, handling multiple requests often required creating multiple curl resources and using loops to send and receive data respectively. Although this method can achieve the purpose

Why NameResolutionError(self.host, self, e) from e and how to solve it Why NameResolutionError(self.host, self, e) from e and how to solve it Mar 01, 2024 pm 01:20 PM

The reason for the error is NameResolutionError(self.host,self,e)frome, which is an exception type in the urllib3 library. The reason for this error is that DNS resolution failed, that is, the host name or IP address attempted to be resolved cannot be found. This may be caused by the entered URL address being incorrect or the DNS server being temporarily unavailable. How to solve this error There may be several ways to solve this error: Check whether the entered URL address is correct and make sure it is accessible Make sure the DNS server is available, you can try using the "ping" command on the command line to test whether the DNS server is available Try accessing the website using the IP address instead of the hostname if behind a proxy

From start to finish: How to use php extension cURL to make HTTP requests From start to finish: How to use php extension cURL to make HTTP requests Jul 29, 2023 pm 05:07 PM

From start to finish: How to use php extension cURL for HTTP requests Introduction: In web development, it is often necessary to communicate with third-party APIs or other remote servers. Using cURL to make HTTP requests is a common and powerful way. This article will introduce how to use PHP to extend cURL to perform HTTP requests, and provide some practical code examples. 1. Preparation First, make sure that php has the cURL extension installed. You can execute php-m|grepcurl on the command line to check

How to get your Steam ID in a few steps? How to get your Steam ID in a few steps? May 08, 2023 pm 11:43 PM

Nowadays, many Windows users who love games have entered the Steam client and can search, download and play any good games. However, many users' profiles may have the exact same name, making it difficult to find a profile or even link a Steam profile to other third-party accounts or join Steam forums to share content. The profile is assigned a unique 17-digit id, which remains the same and cannot be changed by the user at any time, whereas the username or custom URL can. Regardless, some users don't know their Steamid, and it's important to know this. If you don't know how to find your account's Steamid, don't panic. In this article

What is the difference between html and url What is the difference between html and url Mar 06, 2024 pm 03:06 PM

Differences: 1. Different definitions, url is a uniform resource locator, and html is a hypertext markup language; 2. There can be many urls in an html, but only one html page can exist in a url; 3. html refers to is a web page, and url refers to the website address.

See all articles