Explain clearly at once the issue of garbled Chinese characters in python2.
In order to help beginners no longer worry about garbled Chinese characters in python2!
Please see Brother Huang, the teacher of Python training class at Diaim Company, for details:
1. The code module you write needs to specify the encoding
If the code does not specify the coding, python will default all characters to ASCII code,
ASCII code only It supports 256 characters. ASCII code does not support Chinese, so an error is reported.
So you need to write #coding:utf-8 or #coding:gbk before the code
But generally write #coding:utf-8
2. All encodings inside python2 are unified to unicode
unicode can handle all languages in the world character.
utf-8 is an implementation form of unicode, so you need to write #coding:utf-8 before the code
3. Encoding conversion
Keep in mind that the internal encoding of python2 is unicode.
Other encoding decode() is unicode, and then Encoding encode() is the encoding you specify, so there will be no garbled characters.
4. When collecting web pages
Code designation #coding:utf-8
If the encoding of the web page is gbk
It needs to be processed like this:
html = html.decode('gbk').encode('utf-8')
5. You can also write #coding:gbk before the code, but you must also ensure that your code file is saved in gbk. This problem will occur under Windows.
6. Problems with Chinese characters in dictionary keys or values
#coding:utf-8
dict1 ={1:'python weekend training class',2:'Consultation 010-68165761 QQ: 1465376564'}
print dict1
# This output does not display Chinese characters, but displays other encodings of Chinese characters
dict2 ={1:'python video training class',2:'Consultation 010-68165761 QQ: 1465376564'}
for key in dict2:
print dict2[key ]
7. Unicode Chinese character encoding is written into a text file
Needs to be converted according to the encoding of the text file
Can encode('utf-8') or encode('gbk')
Summary: All errors that appear in the error message The error contains "ASCII", which means that the Chinese character encoding is not specified.
----Get the encoding type of the string--------------------------------------------- --------------------------
>>> date = urllib2.urlopen("http://www.baidu.com ")
>>> d = date.read()
>>> import chardet
>>> chardet.detect(d)
{'confidence': 0.99, 'encoding': 'utf-8'}