What should you pay attention to when getting web content in php?

王林
Release: 2023-02-25 09:24:01
Original
2697 people have browsed it

What should you pay attention to when getting web content in php?

Notes on obtaining web page content with PHP

1. Network errors will occur, and any error is possible. For example, the machine is down, the network cable is broken, the domain name is wrong, the network times out, the page is gone, the website jumps, the service is banned, the host load is not enough...

2. The server has added restrictions. Only allow common browsers to access

3. The server has added anti-hotlinking restrictions

4. Some websites do not care whether there is an Accept-Encoding header in your HTTP request, or whether you have a header. What is the specific content of the part? Anyway, I will always send you the gzipped content

5. URL links are all kinds of weird, including ones with Chinese characters, and some even have carriage return and line feed

6. Some websites have a Content-Type in the HTTP header, and there are several Content-Types in the web page. What’s even more outrageous is that each Content-Type is different. The most outrageous thing is that these Content-Types may not be used in the text. Content-Type, resulting in garbled characters

7. The network link is very slow. Multiplied by the time it takes to analyze thousands of pages, I suggest you have a good meal

Get PHP Web page content method

Method 1. Use the file_get_contents method to implement

$url = "http://news.sina.com.cn/c/nd/2016-10-23/doc-ifxwztru6951143.shtml";
    $html = file_get_contents($url);
    //如果出现中文乱码使用下面代码
    //$getcontent = iconv("gb2312", "utf-8",$html);
    echo "<textarea style=&#39;width:800px;height:600px;&#39;>".$html."</textarea>";
Copy after login

Method 2. Use curl to implement

$url = "http://news.sina.com.cn/c/nd/2016-10-23/doc-ifxwztru6951143.shtml";
    
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$html = curl_exec($ch);
curl_close($ch);

echo "<textarea style=&#39;width:800px;height:600px;&#39;>".$html."</textarea>";
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
Copy after login

Adding this code means that if the request is redirected, you can access the final request page, otherwise the request result will display the following content:

<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a href="some link." rel="external nofoll
Copy after login

Recommended tutorial:PHP video tutorial

The above is the detailed content of What should you pay attention to when getting web content in php?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template