Chunked analysis of HTTP protocol-HTML Tutorial-php.cn

Chunked analysis of HTTP protocol

黄舟

Release： 2016-12-16 10:06:51

Original

3542 people have browsed it

There don’t seem to be many websites using Chunked encoding on the Internet, except for those websites that use GZip compression, such as google.com, and most PHP forums that enable GZip compression.

According to my understanding, the main benefit of using Chunked encoding is that the content can be dynamically output during the calculation process of some programs.
For example, you want to process an hour's operation in the background, but you don't want the user to wait an hour to see the results. At this time, Chunked encoding can be used to output the content in chunks, and users can receive the latest processing results at any time.
asp turns off the cached output mode, which is Chunked encoding. (Response.Buffer = false)
Every Response.Write is a Chunked, so don’t use it too frequently, otherwise there will be too many Chunks and the extra data will be a waste of space.
If you want to understand the specific coding structure of Chunked, it is very convenient to use ASP to turn off cache debugging. :)

Let’s first take a look at the definition of Chunked in RFC2616:
Chunked-Body = *chunk
last-chunk
trailer
CRLF

chunk = chunk-size [ chunk-extension ] CRLF
chunk-data CRLF
chunk-size = 1*HEX
last-chunk = 1*("0") [ chunk-extension ] CRLF

chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
chunk-ext-name = token
chunk-ext-val = token | quoted-string
chunk-data = chunk-size(OCTET)
trailer = *(entity-header CRLF)

Let’s simulate the data Structure:
[Chunk size] [Enter] [Chunk data volume] [Enter] [Chunk size] [Enter] [Chunk data volume] [Enter] [0] [Enter]

Pay attention to chunk-size It is expressed in hexadecimal ASCII code, such as 86AE (the actual hexadecimal should be: 38366165), and the calculated length should be: 34478, indicating that there are continuous 34478 bytes of data after the carriage return.
Tracked the return data of www.yahoo.com and found that there are some more spaces in the chunk-size. It may be that the fixed length is 7 bytes. If it is less than 7 bytes, it will be filled with spaces. The ASCII code of spaces is 0x20.

The following is the pseudo code of the decoding process:
length := 0//Used to record the length of the decoded data body
read chunk-size, chunk-extension (if any) and CRLF//The first read chunk size
while (chunk-size > 0) {//Loop until the read chunk size is 0
read chunk-data and CRLF//Read the chunk data body and end with a carriage return
append chunk-data to entity -body//Add the chunk data body to the decoded entity data
length := length + chunk-size//Update the decoded entity length
read chunk-size and CRLF//Read the new chunk size
}
read entity -header//The following code reads all header tags
while (entity-header not empty) {
append entity-header to existing header fields
read entity-header
}
Content-Length := length//in header tag Add content length
Remove "chunked" from Transfer-Encoding//Remove Transfer-Encoding from the header tag

When you have time, study how GZip+Chunked is encoded. It is estimated that each Chunk block is compressed independently by GZip.

Using Chunked, there will naturally be a slight discount on performance, because there is some extra consumption compared to the normal data body.
However, in some cases, chunked output must be used, which is a last resort.

The above is the content of Chunked analysis of the HTTP protocol. For more related articles, please pay attention to the PHP Chinese website (www.php.cn)!