


The solution to crawling garbled web pages using curl and file_get_contents
When I used the curl_init function to crawl Sohu's web pages today, I found that the collected web pages were garbled. After analysis, I found that the server turned on the gzip compression function. Just add multiple options CURLOPT_ENCODING to the function curl_setopt to parse gzip and you can decode it correctly.
Also, if the captured web page is encoded in GBK, but the script is indeed encoded in utf-8, the captured web page must be converted using the function mb_convert_encoding.
<?php $tmp = sys_get_temp_dir(); $cookieDump = tempnam($tmp, 'cookies'); $url = 'http://tv.sohu.com'; $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_HEADER, 1);// 显示返回的Header区域内容 curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); // 使用自动跳转 curl_setopt ($ch, CURLOPT_TIMEOUT, 10);// 设置超时限制 curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); // 获取的信息以文件流的形式返回 curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,10);// 链接超时限制 curl_setopt ($ch, CURLOPT_HTTPHEADER,array('Accept-Encoding: gzip, deflate'));//设置 http 头信息 curl_setopt ($ch, CURLOPT_ENCODING, 'gzip,deflate');//添加 gzip 解码的选项,即使网页没启用 gzip 也没关系 curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookieDump); // 存放Cookie信息的文件名称 $content = curl_exec($ch); // 把抓取的网页由 GBK 转换成 UTF-8 $content = mb_convert_encoding($content,"UTF-8","GBK"); ?>

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

In this chapter, we will understand the Environment Variables, General Configuration, Database Configuration and Email Configuration in CakePHP.

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

To work with date and time in cakephp4, we are going to make use of the available FrozenTime class.

To work on file upload we are going to use the form helper. Here, is an example for file upload.

In this chapter, we are going to learn the following topics related to routing ?

CakePHP is an open-source framework for PHP. It is intended to make developing, deploying and maintaining applications much easier. CakePHP is based on a MVC-like architecture that is both powerful and easy to grasp. Models, Views, and Controllers gu

Validator can be created by adding the following two lines in the controller.

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c
