This article introduces how to use PHP code to determine whether gzip compression is enabled on a web page. Friends in need may wish to use it as a reference.
When collecting web pages, I found that the web pages obtained by file_get_contents were garbled when saved locally. The response header contained Content-Encoding: gzip, but it was normal when viewed in the browser. From this, we can judge that the website has gzip turned on and file_get_contents obtains compressed pages instead of decompressed pages (I wonder if file_get_contents should bring corresponding parameters when requesting web pages to directly obtain web pages that have not been compressed by gzip? ) I have seen before that the file type can be determined by reading the first 2 bytes of the file. The first 2 bytes of a gzip-compressed web page (gbk encoded) are 1F 8B, which can be used to determine whether the web page has been gzip-compressed. Example: <?php //gzip压缩网页 //file_get_contents 直接获得的网页是乱码。 header('Content-Type:text/html;charset=utf-8' ); $url = 'http://bbs.it-home.org'; $file = fopen($url, "rb"); //只读2字节 如果为(16进制)1f 8b (10进制)31 139则开启了gzip ; $bin = fread($file, 2); fclose($file); $strInfo = @unpack("C2chars", $bin); $typeCode = intval($strInfo['chars1'].$strInfo['chars2']); $isGzip = 0; switch ($typeCode) { case 31139: //网站开启了gzip $isGzip = 1; break; default: $isGzip = 0; } $url = $isGzip ? "compress.zlib://".$url:$url; // 三元表达式 $mierHtml = file_get_contents($url); //获得数据 $mierHtml = iconv("gbk","utf-8",$mierHtml); echo $mierHtml; ?> Copy after login |