You need to set Accept-Encoding when crawling before this header will be compressed.
In the browser Accept-Encoding:gzip, deflate, sdch tells the website that the browser supports these three compression methods: gzip, deflate, and sdch. In other words, this does not represent the compression method supported by the website, but the compression method supported by the browser.
The website will choose one of the supported compression methods to return, and the compression method is the value of Content-Encoding. The browser will select the corresponding decompression method based on this value.
Yibai supports gzip, but if Accept-Encoding is not set, no compression will occur.
python3#!/usr/bin/env python3
from urllib import request
USER_AGENT = r'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36'
req = request.Request(r'http://www.qiushibaike.com/', headers={'User-Agent': USER_AGENT, 'Accept-Encoding': 'gzip'})
res = request.urlopen(req)
print(res.info().get('Content-Encoding'))
You need to set
Accept-Encoding
when crawling before this header will be compressed.In the browser
Accept-Encoding:gzip, deflate, sdch
tells the website that the browser supports these three compression methods:gzip
,deflate
, andsdch
. In other words, this does not represent the compression method supported by the website, but the compression method supported by the browser.The website will choose one of the supported compression methods to return, and the compression method is the value of
Content-Encoding
. The browser will select the corresponding decompression method based on this value.Yibai supports
gzip
, but ifAccept-Encoding
is not set, no compression will occur.The output of the above script is