网页爬虫 - python3爬链家utf8页面,中文全部是“南京小区二手房(南京链家网)”
迷茫
迷茫 2017-04-18 09:58:42
0
2
1828
迷茫
迷茫

业精于勤,荒于嬉;行成于思,毁于随。

reply all(2)
阿神
# -*- coding: utf-8 -*-


import requests
from bs4 import BeautifulSoup

url = 'http://nj.lianjia.com/xiaoqu/'
html = requests.get(url)
soup = BeautifulSoup(html.text.encode(html.encoding), 'lxml', from_encoding='utf-8')
title = soup.title.get_text()
print(title)

1) Add # -- coding: utf-8 --statement
2) Correctly handle the encoding of the returned response

小葫芦

That’s right upstairs, in fact you can also use Latin1encoding to decode this text.

import requests
from bs4 import BeautifulSoup

url = 'http://nj.lianjia.com/xiaoqu/'
html = requests.get(url)
soup = BeautifulSoup(html.text, 'lxml')
title = soup.title.get_text()
print(title.encode('latin1').decode('utf-8'))
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template