Retrieving Page Content Using cURL
In this context, you seek to scrape the content of a Google search results page using cURL. Despite attempting to set user agents and various options, successful retrieval of the page content has eluded you. Redirects or "page moved" errors continue to plague your efforts.
It is believed that the issue may stem from the encoding of special characters in the query string. To mitigate this, alterations to your PHP code are necessary.
Here's the approach:
<code class="php">function get_web_page($url) { $user_agent = 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0'; $options = array( CURLOPT_CUSTOMREQUEST => "GET", CURLOPT_POST => false, CURLOPT_USERAGENT => $user_agent, CURLOPT_COOKIEFILE => "cookie.txt", CURLOPT_COOKIEJAR => "cookie.txt", CURLOPT_RETURNTRANSFER => true, CURLOPT_HEADER => false, CURLOPT_FOLLOWLOCATION => true, CURLOPT_ENCODING => "", CURLOPT_AUTOREFERER => true, CURLOPT_CONNECTTIMEOUT => 120, CURLOPT_TIMEOUT => 120, CURLOPT_MAXREDIRS => 10 ); $ch = curl_init($url); curl_setopt_array($ch, $options); $content = curl_exec($ch); $err = curl_errno($ch); $errmsg = curl_error($ch); $header = curl_getinfo($ch); curl_close($ch); $header['errno'] = $err; $header['errmsg'] = $errmsg; $header['content'] = $content; return $header; }</code>
Usage:
<code class="php">$result = get_web_page($url); if ($result['errno'] != 0) { // Handle errors: bad URL, timeout, redirect loop } if ($result['http_code'] != 200) { // Handle errors: no page, no permissions, no service } $page = $result['content'];</code>
With this code, you can now retrieve the exact page content as displayed in your browser. By accounting for the special characters in the query string, you can overcome the obstacles you faced previously.
The above is the detailed content of How to Retrieve Page Content Using cURL Despite \'Page Moved\' Errors?. For more information, please follow other related articles on the PHP Chinese website!