The examples in this article describe the Python encoding type conversion method. Share it with everyone for your reference, the details are as follows:
1: Python and unicode
In order to correctly handle multi-language texts, Python was introduced after version 2.0 Unicode string.
2: print in python
Although python internally needs to convert the text encoding to unicode encoding for processing, the terminal display work is completed by traditional Python strings (In fact, Python's print statement cannot print out double-byte Unicode-encoded characters at all).
Python's print will automatically perform encoding conversion on the output Unicode encoding (for other non-Unicode encodings, print will output it as is) (when output to the console), but the write method of the file object will not do it. , Therefore, when some strings are output normally by printing, writing to the file may not necessarily be the same as printing.
Under Linux, it is converted according to environment variables. You can see it by using the locale command under Linux. The implementation of the print statement is to transmit the content to be output to the operating system, and the operating system will encode the input byte stream according to the system's encoding.
>>>str='学习python' >>> str '\xe5\xad\xa6\xe4\xb9\xa0python' #asII编码 >>> print str 学习python >>> str=u'学习python' >>> str ####unicode编码 '\xe5u\xad\xa6\xe4\xb9\xa0python'
3: decode
in python converts other character sets into unicode encoding (only Chinese characters Need to be converted)
>>> str='学习' >>> ustr=str.decode('utf-8') >>> ustr u'\u5b66\u4e60'
In this way, the Chinese characters are encoded and converted, and python can be used for subsequent processing; (if not converted, python will be based on the machine's Environment variables perform default encoding conversion, so garbled characters may appear)
4: encode
in python converts unicode into other character sets
>>> str='学习' >>> ustr=str.decode('utf-8') >>> ustr u'\u5b66\u4e60' >>> ustr.encode('utf-8') '\xe5\xad\xa6\xe4\xb9\xa0' >>> print ustr.encode('utf-8') 学习
For more articles related to Python encoding type conversion, please pay attention to the PHP Chinese website!