Notes on obtaining web page content with PHP
1. Network errors will occur, and any error is possible. For example, the machine is down, the network cable is broken, the domain name is wrong, the network times out, the page is gone, the website jumps, the service is banned, the host load is not enough...
2. The server has added restrictions. Only allow common browsers to access
3. The server has added anti-hotlinking restrictions
4. Some websites do not care whether there is an Accept-Encoding header in your HTTP request, or whether you have a header. What is the specific content of the part? Anyway, I will always send you the gzipped content
5. URL links are all kinds of weird, including ones with Chinese characters, and some even have carriage return and line feed
6. Some websites have a Content-Type in the HTTP header, and there are several Content-Types in the web page. What’s even more outrageous is that each Content-Type is different. The most outrageous thing is that these Content-Types may not be used in the text. Content-Type, resulting in garbled characters
7. The network link is very slow. Multiplied by the time it takes to analyze thousands of pages, I suggest you have a good meal
Get PHP Web page content method
Method 1. Use the file_get_contents method to implement
$url = "http://news.sina.com.cn/c/nd/2016-10-23/doc-ifxwztru6951143.shtml"; $html = file_get_contents($url); //如果出现中文乱码使用下面代码 //$getcontent = iconv("gb2312", "utf-8",$html); echo "<textarea style='width:800px;height:600px;'>".$html."</textarea>";
Method 2. Use curl to implement
$url = "http://news.sina.com.cn/c/nd/2016-10-23/doc-ifxwztru6951143.shtml"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); $html = curl_exec($ch); curl_close($ch); echo "<textarea style='width:800px;height:600px;'>".$html."</textarea>"; curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
Adding this code means that if the request is redirected, you can access the final request page, otherwise the request result will display the following content:
<head><title>Object moved</title></head> <body><h1>Object Moved</h1>This object may be found <a href="some link." rel="external nofoll
Recommended tutorial:PHP video tutorial
The above is the detailed content of What should you pay attention to when getting web content in php?. For more information, please follow other related articles on the PHP Chinese website!