The problem of Chinese garbled characters when python writes html files
Use the open function to write the html crawled by the crawler into the file, sometimes in the console It will not be garbled, but the Chinese in the html written to the file is garbled
Case Analysis
Look at the following piece of code:
# 爬虫未使用cookiefrom urllib import requestif __name__ == '__main__': url = "http://www.renren.com/967487029/profile" rsp = request.urlopen(url) html = rsp.read().decode() with open("rsp.html","w")as f: # 将爬取的页面 print(html) f.write(html)
seems to have no problem, and there will be no Chinese garbled characters in the html output on the console, but in the created html file
Solution
Use a parameter of the open method named encoding="", and add encoding="utf-8"
# 爬虫未使用cookiefrom urllib import requestif __name__ == '__main__': url = "http://www.renren.com/967487029/profile" rsp = request.urlopen(url) html = rsp.read().decode() with open("rsp.html","w",encoding="utf-8")as f: # 将爬取的页面 print(html) f.write(html)
Running results
Thank you all for reading, I hope you will benefit a lot.
This article is reproduced from: https://blog.csdn.net/qq_40147863/article/details/81746445
Recommended tutorial: "python tutorial"
The above is the detailed content of How to solve the problem of garbled characters in html files written by Python (detailed explanation with pictures and texts). For more information, please follow other related articles on the PHP Chinese website!