python - beautifulsoup解析中文网页的编码问题

Question

对于同一个页面，几乎同样的代码，在Python3，windows8环境下能够正常解析运行。但是把代码移植到Ubuntu，Python2.7下面之后，会出现获取的网页不能被beautifulsoup解析，find_all('table')返回空节点的情况。
出问题的代码的一部分（可以运行）：

阿神 · Answer

有没有尝试过换一个解析器。
python2.7的html解析器容错率很差。
推荐lxml。

大家讲道理 · Answer

呃，这个主要是编码问题。。。python的encoding问题没弄明白绝对是个大坑。
我看到这几句话，好像都有点问题：

1. mybytes = fp.read().decode('gbk').encode('utf-8')
2. soup = BeautifulSoup(mybytes,from_coding="uft-8")
3. print soup.original_encoding
4. print soup.prettify()

其中，

不需要编码转换，bs可以接受任何编码，unicode更好。所以即使编码转换也应该只到decode就够了
bs实例构造用法是BeautifulSoup(html, 'html5lib')，第二个参数是解释器，而不是编码。
直接print soup就能出结果啦，显不显示中文主要和编码有关，bs的编码转换能力其实还是不那么强的，所以明文调用也会出问题
soup.prettify('utf-8')这样的才能保证输出的编码正确。

PHP8, ich komme auch

Lernen Sie das Website-Layout in 30 Minuten

Shangguan Oracle Video-Tutorial für Anfänger bis Fortgeschrittene

Ihre erste Zeile UNI-APP-Code

Flattern Sie von Grund auf bis zum App-Start

Brother Lian Neues Linux-Video-Tutorial

AXURE 9 Video-Tutorial (geeignet für die interaktive Produktdesign-Benutzeroberfläche von Product Manager)

Zero Basic Proficiency PS-Video-Tutorial

16-tägiges UI-Video-Tutorial für den Einstieg

PS-Techniken und Slicing-Techniken-Video-Tutorial

Video-Tutorial zum Bau und zur Projekteinführung der Alibaba Cloud-Umgebung

Überblick über Computernetzwerke – Grundkenntnisse, die Programmierer beherrschen müssen

Grundlegendes Tutorial für Programmierer – Erklärung des HTTP-Protokolls

Websocket-Video-Tutorial