Last night, when a friend in the group collected web pages, they found that the web pages obtained by file_get_contents were garbled when saved locally. The response header contained Content-Encoding: gzip
, but it looked normal in the browser.
Because I have had relevant experience, I immediately discovered that the website turned on gzip and file_get_contents obtained the compressed page instead of the decompressed page (I don’t know if file_get_contents should be brought when requesting the web page. Corresponding parameters, directly obtain the web page that has not been compressed by gzip? )
I just saw not long ago that the file type can be determined by reading the first 2 bytes of the file. Friends in the group also said that the first 2 bytes of a gzip-compressed web page (gbk encoded) are 1F 8B, so you can determine whether the web page has been gzip-compressed.
The code is as follows:
Copy the code The code is as follows:
//Mire Military Network uses gzip to compress web pages
//file_get_contents The web page obtained directly is garbled.
header('Content-Type:text/html;charset=utf-8' );
$url = 'http://www.miercn.com';
$file = fopen($url , "rb");
//Read only 2 bytes If it is (hexadecimal) 1f 8b (decimal) 31 139, gzip is enabled;
$bin = fread($file, 2) ;
fclose($file);
$strInfo = @unpack("C2chars", $bin);
$typeCode = intval($strInfo['chars1'].$strInfo['chars2'] );
$isGzip = 0;
switch ($typeCode)
{
case 31139: ;
default:
$isGzip = 0;
}
$url = $isGzip ? "compress.zlib://".$url:$url; // ternary expression
$mierHtml = file_get_contents($url); //Get Mier Military Network data
$mierHtml = iconv("gbk","utf-8",$mierHtml);
echo $mierHtml;
http://www.bkjia.com/PHPjc/327839.html
www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/327839.htmlTechArticleLast night when a friend in the group collected web pages, they found that the web pages obtained by file_get_contents were saved locally as garbled characters, and the response headers were Content-Encoding:gzip But it looks normal in the browser. ...