Table of Contents
How to use file_get_contents in PHP to crawl Chinese garbled web pages,
Home Backend Development PHP Tutorial How to use file_get_contents in PHP to crawl Chinese garbled web pages, _PHP tutorial

How to use file_get_contents in PHP to crawl Chinese garbled web pages, _PHP tutorial

Jul 13, 2016 am 10:11 AM
php Chinese garbled crawl Solution

How to use file_get_contents in PHP to crawl Chinese garbled web pages,

The example in this article describes how to use file_get_contents in PHP to crawl Chinese garbled web pages. Share it with everyone for your reference. The specific method is as follows:

The file_get_contents function is originally a very excellent local and remote file operation function that comes with PHP. It allows us to download remote data directly without any effort, but I encountered some problems when using it to read web pages. The page is garbled. Here we will summarize the specific solutions for you.

According to friends on the Internet, the reason may be that the server has turned on GZIP compression. The following is to use firebug to check the header information of my website. Gzip is turned on, and the original header information of the request header information is as follows:

Copy code The code is as follows:
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q =0.8
Accept-Encoding gzip, deflate
Accept-Language zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3
Connection keep-alive
Cookie __utma=225240837.787252530.1317310581.1335406161.1335411401.1537; __utmz=225240837.1326850415.887.3.utmcsr=google|utmccn=(organic)|utmcmd=organic| utmctr=%E4%BB%BB%E4%BD%95%E9%A1%B9% E7%9B%AE%E9%83%BD%E4%B8%8D%E4%BC%9A%E9%82%A3%E4%B9%88%E7%AE%80%E5%8D%95%20site% 3Awww.nowamagic.net; PHPSESSID=888mj4425p8s0m7s0frre3ovc7; __utmc=225240837; __utmb=225240837.1.10.1335411401
Host www.jb51.net
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0

The Content-Encoding item can be found from the header information and is Gzip.

The solution is relatively simple, which is to use curl instead of file_get_contents to obtain, and then add one to the curl configuration parameters. The code is as follows:

Copy code The code is as follows:
curl_setopt($ch, CURLOPT_ENCODING, "gzip");

When I used file_get_contents to capture pictures today, I didn’t notice this problem at first, and it took a lot of effort to find it out.

Use the built-in zlib library. If the server has installed the zlib library, you can easily solve the garbled code problem by using the following code:

Copy code The code is as follows:
$data = file_get_contents("compress.zlib://".$url);

I hope this article will be helpful to everyone’s PHP programming design.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/929093.htmlTechArticleSolution to the problem of using file_get_contents to crawl Chinese garbled web pages in PHP. This article describes the use of file_get_contents in PHP to crawl web pages. Solution to the problem of Chinese garbled characters. Share with everyone...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Unable to log in to mysql as root Unable to log in to mysql as root Apr 08, 2025 pm 04:54 PM

The main reasons why you cannot log in to MySQL as root are permission problems, configuration file errors, password inconsistent, socket file problems, or firewall interception. The solution includes: check whether the bind-address parameter in the configuration file is configured correctly. Check whether the root user permissions have been modified or deleted and reset. Verify that the password is accurate, including case and special characters. Check socket file permission settings and paths. Check that the firewall blocks connections to the MySQL server.

Solutions to the errors reported by MySQL on a specific system version Solutions to the errors reported by MySQL on a specific system version Apr 08, 2025 am 11:54 AM

The solution to MySQL installation error is: 1. Carefully check the system environment to ensure that the MySQL dependency library requirements are met. Different operating systems and version requirements are different; 2. Carefully read the error message and take corresponding measures according to prompts (such as missing library files or insufficient permissions), such as installing dependencies or using sudo commands; 3. If necessary, try to install the source code and carefully check the compilation log, but this requires a certain amount of Linux knowledge and experience. The key to ultimately solving the problem is to carefully check the system environment and error information, and refer to the official documents.

Navicat's solution to the database cannot be connected Navicat's solution to the database cannot be connected Apr 08, 2025 pm 11:12 PM

The following steps can be used to resolve the problem that Navicat cannot connect to the database: Check the server connection, make sure the server is running, address and port correctly, and the firewall allows connections. Verify the login information and confirm that the user name, password and permissions are correct. Check network connections and troubleshoot network problems such as router or firewall failures. Disable SSL connections, which may not be supported by some servers. Check the database version to make sure the Navicat version is compatible with the target database. Adjust the connection timeout, and for remote or slower connections, increase the connection timeout timeout. Other workarounds, if the above steps are not working, you can try restarting the software, using a different connection driver, or consulting the database administrator or official Navicat support.

The Future of PHP: Adaptations and Innovations The Future of PHP: Adaptations and Innovations Apr 11, 2025 am 12:01 AM

The future of PHP will be achieved by adapting to new technology trends and introducing innovative features: 1) Adapting to cloud computing, containerization and microservice architectures, supporting Docker and Kubernetes; 2) introducing JIT compilers and enumeration types to improve performance and data processing efficiency; 3) Continuously optimize performance and promote best practices.

MySQL can't be installed after downloading MySQL can't be installed after downloading Apr 08, 2025 am 11:24 AM

The main reasons for MySQL installation failure are: 1. Permission issues, you need to run as an administrator or use the sudo command; 2. Dependencies are missing, and you need to install relevant development packages; 3. Port conflicts, you need to close the program that occupies port 3306 or modify the configuration file; 4. The installation package is corrupt, you need to download and verify the integrity; 5. The environment variable is incorrectly configured, and the environment variables must be correctly configured according to the operating system. Solve these problems and carefully check each step to successfully install MySQL.

How to solve mysql cannot be started How to solve mysql cannot be started Apr 08, 2025 pm 02:21 PM

There are many reasons why MySQL startup fails, and it can be diagnosed by checking the error log. Common causes include port conflicts (check port occupancy and modify configuration), permission issues (check service running user permissions), configuration file errors (check parameter settings), data directory corruption (restore data or rebuild table space), InnoDB table space issues (check ibdata1 files), plug-in loading failure (check error log). When solving problems, you should analyze them based on the error log, find the root cause of the problem, and develop the habit of backing up data regularly to prevent and solve problems.

PHP vs. Python: Understanding the Differences PHP vs. Python: Understanding the Differences Apr 11, 2025 am 12:15 AM

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

How to solve mysql cannot connect to local host How to solve mysql cannot connect to local host Apr 08, 2025 pm 02:24 PM

The MySQL connection may be due to the following reasons: MySQL service is not started, the firewall intercepts the connection, the port number is incorrect, the user name or password is incorrect, the listening address in my.cnf is improperly configured, etc. The troubleshooting steps include: 1. Check whether the MySQL service is running; 2. Adjust the firewall settings to allow MySQL to listen to port 3306; 3. Confirm that the port number is consistent with the actual port number; 4. Check whether the user name and password are correct; 5. Make sure the bind-address settings in my.cnf are correct.

See all articles