from bs4 import BeautifulSoup
from bs4 import UnicodeDammit
import requests
def run():
soup = requests.get('http://zy.upln.cn/gongshi2014/index.html').text
soup = BeautifulSoup(soup,'html.parser')
soup = soup.find('tbody')
for x in soup.find_all('tr'):
for y in x.find_all('td'):
s = y.a.text
print(s)
if __name__=="__main__":
run()
读取之后的内容不知道是不是gbk被当成UTF-8来处理了
求教
Personally, it is recommended that when obtaining the response, the string should be parsed according to the encoding format specified in the response
Output
Hello! I've also encountered similar problems.
The solution is to change print(s) to print(s.encode('latin1').decode('utf-8'))
This is the running result:
Good Luck!