In Python2, strings cannot fully support international character sets and Unicode encoding. To work around this limitation, Python 2 uses a separate string type for Unicode data. To enter a Unicode string literal, add 'u' before the first quotation mark. Ordinary strings in Python 2 are actually encoded (non-Unicode) byte strings. In Python3, there is no need to add this prefix character, otherwise it will be a syntax error, because all strings are already Unicode encoded by default.
But there is one more kind of string in python3 (recommended learning: Python video tutorial)
type(b'132') => byte型
And in Writing like this in python2 will report an error
So obviously, there will be a big difference in the use of decode and encode
decode in python2 converts str type to unicode type
Decode in python3 is to convert byte type into str type
Take python3 as an example:
src = ‘你好世界’
The str at this time is str type. If you need to convert it to byte type, you can Pass:
src = src.encode('utf-8')
At this time, src is already of byte type. If you want to convert it back to str, use it directly:
src = src .decode()
In addition, the open function in python3 adds an encoding parameter, and the default is UTF-8, that is, when the opened file handle is read or written, only str characters containing unicode format are received.
If you pass in a binary file at this time, an error will be reported, for example:
with open('a.bin', 'w') as f: f.write('xxx')
If you want to read and write binary files, you need to specify the opening method as 'wb' or 'rb'
In addition, if you find that the webpage cannot be displayed properly when crawling it, you need to transcode the content.
For more Python related technical articles, please visit the Python Tutorial column to learn!
The above is the detailed content of The difference between python2 and python3 strings. For more information, please follow other related articles on the PHP Chinese website!