Table of Contents
回复讨论(解决方案)
Home Backend Development PHP Tutorial php解压有时会失败

php解压有时会失败

Jun 23, 2016 pm 01:20 PM

采集一个网站的数据时,返回的是以chunked编码,gzip压缩的文档,该网站的服务器显示是IIS,。。。

解码chunked没问题,但是解压gzip压缩文档时,偶尔会失败,这样就影响我提取下一组请求连接了。。。

解压10组左右,就会出现解压失败的情况。。

这是解压前的数据:



解压后的数据:


显然在最后一组,解压失败了。。

这是尝试用过的三组方法:

 private function _deCompressData()   {       if($this->is_gzip) {          $this->response_body =  gzinflate(substr($this->response_body,10));           //           //           if($temp = gzdecode($this->response_body)) {//               $this->response_body = $temp;//           } else {//              $this->response_body =  $this->mygzdecode($this->response_body);//           }                     //$this->response_body =  $this->mygzdecode($this->response_body);             //         $this->response_body = gzdecode($this->response_body);       }   }
Copy after login


mygzdecode函数是这一个

 /**    * @desc 自定义解压函数    */   function mygzdecode($data, &$filename = '', &$error = '', $maxlength = null)    {        $len = strlen($data);        if ($len < 18 || strcmp(substr($data, 0, 2), "\x1f\x8b")) {            $error = "Not in GZIP format.";            return null;  // Not GZIP format (See RFC 1952)        }        $method = ord(substr($data, 2, 1));  // Compression method        $flags = ord(substr($data, 3, 1));  // Flags        if ($flags & 31 != $flags) {            $error = "Reserved bits not allowed.";            return null;        }        // NOTE: $mtime may be negative (PHP integer limitations)        $mtime = unpack("V", substr($data, 4, 4));        $mtime = $mtime[1];        $xfl = substr($data, 8, 1);        $os = substr($data, 8, 1);        $headerlen = 10;        $extralen = 0;        $extra = "";        if ($flags & 4) {            // 2-byte length prefixed EXTRA data in header            if ($len - $headerlen - 2 < 8) {                return false;  // invalid            }            $extralen = unpack("v", substr($data, 8, 2));            $extralen = $extralen[1];            if ($len - $headerlen - 2 - $extralen < 8) {                return false;  // invalid            }            $extra = substr($data, 10, $extralen);            $headerlen += 2 + $extralen;        }        $filenamelen = 0;        $filename = "";        if ($flags & 8) {            // C-style string            if ($len - $headerlen - 1 < 8) {                return false; // invalid            }            $filenamelen = strpos(substr($data, $headerlen), chr(0));            if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {                return false; // invalid            }            $filename = substr($data, $headerlen, $filenamelen);            $headerlen += $filenamelen + 1;        }        $commentlen = 0;        $comment = "";        if ($flags & 16) {            // C-style string COMMENT data in header            if ($len - $headerlen - 1 < 8) {                return false;    // invalid            }            $commentlen = strpos(substr($data, $headerlen), chr(0));            if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {                return false;    // Invalid header format            }            $comment = substr($data, $headerlen, $commentlen);            $headerlen += $commentlen + 1;        }        $headercrc = "";        if ($flags & 2) {            // 2-bytes (lowest order) of CRC32 on header present            if ($len - $headerlen - 2 < 8) {                return false;    // invalid            }            $calccrc = crc32(substr($data, 0, $headerlen)) & 0xffff;            $headercrc = unpack("v", substr($data, $headerlen, 2));            $headercrc = $headercrc[1];            if ($headercrc != $calccrc) {                $error = "Header checksum failed.";                return false;    // Bad header CRC            }            $headerlen += 2;        }        // GZIP FOOTER        $datacrc = unpack("V", substr($data, -8, 4));        $datacrc = sprintf('%u', $datacrc[1] & 0xFFFFFFFF);        $isize = unpack("V", substr($data, -4));        $isize = $isize[1];        // decompression:        $bodylen = $len - $headerlen - 8;        if ($bodylen < 1) {            // IMPLEMENTATION BUG!            return null;        }        $body = substr($data, $headerlen, $bodylen);        $data = "";        if ($bodylen > 0) {            switch ($method) {                case 8:                    // Currently the only supported compression method:                    $data = gzinflate($body, $maxlength);                    break;                default:                    $error = "Unknown compression method.";                    return false;            }        }  // zero-byte body content is allowed        // Verifiy CRC32        $crc = sprintf("%u", crc32($data));        $crcOK = $crc == $datacrc;        $lenOK = $isize == strlen($data);        if (!$lenOK || !$crcOK) {            $error = ( $lenOK ? '' : 'Length check FAILED. ') . ( $crcOK ? '' : 'Checksum FAILED.');            return false;        }        return $data;    }
Copy after login



也就是说,连续解压时,会出现解压失败的情况


回复讨论(解决方案)

php 已经提供了 gzdecode 函数
如果你的 php 版本实在很低,没有 gzdecode 函数
那么 php 代码级的 gzdecode 函数是

function gzdecode($data) {   $len = strlen($data);   if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {     return $data;  // Not GZIP format (See RFC 1952)   }   $method = ord(substr($data,2,1));  // Compression method   $flags  = ord(substr($data,3,1));  // Flags   if ($flags & 31 != $flags) {     // Reserved bits are set -- NOT ALLOWED by RFC 1952     return data;   }   // NOTE: $mtime may be negative (PHP integer limitations)   $mtime = unpack("V", substr($data,4,4));   $mtime = $mtime[1];   $xfl   = substr($data,8,1);   $os    = substr($data,8,1);   $headerlen = 10;   $extralen  = 0;   $extra     = "";   if ($flags & 4) {     // 2-byte length prefixed EXTRA data in header     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $extralen = unpack("v",substr($data,8,2));     $extralen = $extralen[1];     if ($len - $headerlen - 2 - $extralen < 8) {       return false;    // Invalid format     }     $extra = substr($data,10,$extralen);     $headerlen += 2 + $extralen;   }   $filenamelen = 0;   $filename = "";   if ($flags & 8) {     // C-style string file NAME data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $filenamelen = strpos(substr($data,8+$extralen),chr(0));     if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {       return false;    // Invalid format     }     $filename = substr($data,$headerlen,$filenamelen);     $headerlen += $filenamelen + 1;   }   $commentlen = 0;   $comment = "";   if ($flags & 16) {     // C-style string COMMENT data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));     if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {       return false;    // Invalid header format     }     $comment = substr($data,$headerlen,$commentlen);     $headerlen += $commentlen + 1;   }   $headercrc = "";   if ($flags & 1) {     // 2-bytes (lowest order) of CRC32 on header present     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;     $headercrc = unpack("v", substr($data,$headerlen,2));     $headercrc = $headercrc[1];     if ($headercrc != $calccrc) {       return false;    // Bad header CRC     }     $headerlen += 2;   }   // GZIP FOOTER - These be negative due to PHP's limitations   $datacrc = unpack("V",substr($data,-8,4));   $datacrc = $datacrc[1];   $isize = unpack("V",substr($data,-4));   $isize = $isize[1];   // Perform the decompression:   $bodylen = $len-$headerlen-8;   if ($bodylen < 1) {     // This should never happen - IMPLEMENTATION BUG!     return null;   }   $body = substr($data,$headerlen,$bodylen);   $data = "";   if ($bodylen > 0) {     switch ($method) {       case 8:         // Currently the only supported compression method:         $data = gzinflate($body);         break;       default:         // Unknown compression method         return false;     }   } else {     // I'm not sure if zero-byte body content is allowed.     // Allow it for now...  Do nothing...   }   // Verifiy decompressed size and CRC32:   // NOTE: This may fail with large data sizes depending on how   //       PHP's integer limitations affect strlen() since $isize   //       may be negative for large sizes.   if ($isize != strlen($data) || crc32($data) != $datacrc) {     // Bad format!  Length or CRC doesn't match!     return false;   }   return $data; }
Copy after login
Copy after login

自己对比一下,看看是否是你抄写错了

既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作

php 已经提供了 gzdecode 函数
如果你的 php 版本实在很低,没有 gzdecode 函数
那么 php 代码级的 gzdecode 函数是

function gzdecode($data) {   $len = strlen($data);   if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {     return $data;  // Not GZIP format (See RFC 1952)   }   $method = ord(substr($data,2,1));  // Compression method   $flags  = ord(substr($data,3,1));  // Flags   if ($flags & 31 != $flags) {     // Reserved bits are set -- NOT ALLOWED by RFC 1952     return data;   }   // NOTE: $mtime may be negative (PHP integer limitations)   $mtime = unpack("V", substr($data,4,4));   $mtime = $mtime[1];   $xfl   = substr($data,8,1);   $os    = substr($data,8,1);   $headerlen = 10;   $extralen  = 0;   $extra     = "";   if ($flags & 4) {     // 2-byte length prefixed EXTRA data in header     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $extralen = unpack("v",substr($data,8,2));     $extralen = $extralen[1];     if ($len - $headerlen - 2 - $extralen < 8) {       return false;    // Invalid format     }     $extra = substr($data,10,$extralen);     $headerlen += 2 + $extralen;   }   $filenamelen = 0;   $filename = "";   if ($flags & 8) {     // C-style string file NAME data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $filenamelen = strpos(substr($data,8+$extralen),chr(0));     if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {       return false;    // Invalid format     }     $filename = substr($data,$headerlen,$filenamelen);     $headerlen += $filenamelen + 1;   }   $commentlen = 0;   $comment = "";   if ($flags & 16) {     // C-style string COMMENT data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));     if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {       return false;    // Invalid header format     }     $comment = substr($data,$headerlen,$commentlen);     $headerlen += $commentlen + 1;   }   $headercrc = "";   if ($flags & 1) {     // 2-bytes (lowest order) of CRC32 on header present     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;     $headercrc = unpack("v", substr($data,$headerlen,2));     $headercrc = $headercrc[1];     if ($headercrc != $calccrc) {       return false;    // Bad header CRC     }     $headerlen += 2;   }   // GZIP FOOTER - These be negative due to PHP's limitations   $datacrc = unpack("V",substr($data,-8,4));   $datacrc = $datacrc[1];   $isize = unpack("V",substr($data,-4));   $isize = $isize[1];   // Perform the decompression:   $bodylen = $len-$headerlen-8;   if ($bodylen < 1) {     // This should never happen - IMPLEMENTATION BUG!     return null;   }   $body = substr($data,$headerlen,$bodylen);   $data = "";   if ($bodylen > 0) {     switch ($method) {       case 8:         // Currently the only supported compression method:         $data = gzinflate($body);         break;       default:         // Unknown compression method         return false;     }   } else {     // I'm not sure if zero-byte body content is allowed.     // Allow it for now...  Do nothing...   }   // Verifiy decompressed size and CRC32:   // NOTE: This may fail with large data sizes depending on how   //       PHP's integer limitations affect strlen() since $isize   //       may be negative for large sizes.   if ($isize != strlen($data) || crc32($data) != $datacrc) {     // Bad format!  Length or CRC doesn't match!     return false;   }   return $data; }
Copy after login
Copy after login




我的是PHP 5.6  ,
gzinflate(substr($this->response_body,10));

gzdecode($this->response_body)

mygzdecode($this->response_body);

这三种方法都可以用,但都遇到同一个问题,连续解压时,会出现解压失败的问题。


大婶,新年快乐哈

自己对比一下,看看是否是你抄写错了

既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作



好的。 

在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了

主要是你要有容错策略

自己对比一下,看看是否是你抄写错了

既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作




 // Verifiy CRC32
        $crc = sprintf("%u", crc32($data));
        $crcOK = $crc == $datacrc;
        $lenOK = $isize == strlen($data);
        if (!$lenOK || !$crcOK) {
            $this->status = ( $lenOK ? '' : 'Length check FAILED. ') . ( $crcOK ? '' : 'Checksum FAILED.');
            return false;
        }
        return $data;
检测出来了,是这里校验失败了。。。


对链接http://www.cnu.cc/works/111706发起请求
Length check FAILED. Checksum FAILED.

在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了

主要是你要有容错策略



对。。。  这个地方确实需要加强。。。只做了重置连接,没有对收到数据的完整性做校验。。

在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了

主要是你要有容错策略




OK了,连续采集10分钟,没出问题  。。。THX,,摸摸大  

传输过程出问题,导致部分数据没有了,而解压失败。

把需要解压的文件加入解压列表,每隔5秒-10秒判断解压文件是否变化,如无变化,则解压,解压失败做标记,继续下一个解压。

传输过程出问题,导致部分数据没有了,而解压失败。



Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Working with Flash Session Data in Laravel Working with Flash Session Data in Laravel Mar 12, 2025 pm 05:08 PM

Laravel simplifies handling temporary session data using its intuitive flash methods. This is perfect for displaying brief messages, alerts, or notifications within your application. Data persists only for the subsequent request by default: $request-

cURL in PHP: How to Use the PHP cURL Extension in REST APIs cURL in PHP: How to Use the PHP cURL Extension in REST APIs Mar 14, 2025 am 11:42 AM

The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

Simplified HTTP Response Mocking in Laravel Tests Simplified HTTP Response Mocking in Laravel Tests Mar 12, 2025 pm 05:09 PM

Laravel provides concise HTTP response simulation syntax, simplifying HTTP interaction testing. This approach significantly reduces code redundancy while making your test simulation more intuitive. The basic implementation provides a variety of response type shortcuts: use Illuminate\Support\Facades\Http; Http::fake([ 'google.com' => 'Hello World', 'github.com' => ['foo' => 'bar'], 'forge.laravel.com' =>

12 Best PHP Chat Scripts on CodeCanyon 12 Best PHP Chat Scripts on CodeCanyon Mar 13, 2025 pm 12:08 PM

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

Explain the concept of late static binding in PHP. Explain the concept of late static binding in PHP. Mar 21, 2025 pm 01:33 PM

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

PHP Logging: Best Practices for PHP Log Analysis PHP Logging: Best Practices for PHP Log Analysis Mar 10, 2025 pm 02:32 PM

PHP logging is essential for monitoring and debugging web applications, as well as capturing critical events, errors, and runtime behavior. It provides valuable insights into system performance, helps identify issues, and supports faster troubleshoot

Discover File Downloads in Laravel with Storage::download Discover File Downloads in Laravel with Storage::download Mar 06, 2025 am 02:22 AM

The Storage::download method of the Laravel framework provides a concise API for safely handling file downloads while managing abstractions of file storage. Here is an example of using Storage::download() in the example controller:

HTTP Method Verification in Laravel HTTP Method Verification in Laravel Mar 05, 2025 pm 04:14 PM

Laravel simplifies HTTP verb handling in incoming requests, streamlining diverse operation management within your applications. The method() and isMethod() methods efficiently identify and validate request types. This feature is crucial for building

See all articles