Home > Backend Development > PHP Tutorial > How to Retrieve Page Content Using cURL Despite \'Page Moved\' Errors?

How to Retrieve Page Content Using cURL Despite \'Page Moved\' Errors?

Patricia Arquette
Release: 2024-10-22 20:52:03
Original
576 people have browsed it

How to Retrieve Page Content Using cURL Despite

Retrieving Page Content Using cURL

In this context, you seek to scrape the content of a Google search results page using cURL. Despite attempting to set user agents and various options, successful retrieval of the page content has eluded you. Redirects or "page moved" errors continue to plague your efforts.

It is believed that the issue may stem from the encoding of special characters in the query string. To mitigate this, alterations to your PHP code are necessary.

Here's the approach:

<code class="php">function get_web_page($url)
{
    $user_agent = 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';

    $options = array(
        CURLOPT_CUSTOMREQUEST => "GET",
        CURLOPT_POST           => false,
        CURLOPT_USERAGENT      => $user_agent,
        CURLOPT_COOKIEFILE     => "cookie.txt",
        CURLOPT_COOKIEJAR      => "cookie.txt",
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HEADER         => false,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_ENCODING       => "",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_MAXREDIRS      => 10
    );

    $ch = curl_init($url);
    curl_setopt_array($ch, $options);
    $content = curl_exec($ch);
    $err = curl_errno($ch);
    $errmsg = curl_error($ch);
    $header = curl_getinfo($ch);
    curl_close($ch);

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}</code>
Copy after login

Usage:

<code class="php">$result = get_web_page($url);

if ($result['errno'] != 0) {
    // Handle errors: bad URL, timeout, redirect loop
}

if ($result['http_code'] != 200) {
    // Handle errors: no page, no permissions, no service
}

$page = $result['content'];</code>
Copy after login

With this code, you can now retrieve the exact page content as displayed in your browser. By accounting for the special characters in the query string, you can overcome the obstacles you faced previously.

The above is the detailed content of How to Retrieve Page Content Using cURL Despite \'Page Moved\' Errors?. For more information, please follow other related articles on the PHP Chinese website!

source:php
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template