Python3 One of the most important improvements is to solve the big pit left by string and character encoding in Python2. Why is Python coding so painful? Some flaws in Python2 string design have been introduced:
- Using ASCII code as the default encoding method is very unfriendly to Chinese processing.
- Far-fetchedly dividing strings into two types, unicode and str, misleading developers
Of course, this is not a bug. As long as you pay more attention when processing, you can avoid these pitfalls. But in Python3 both problems are solved very well.
First, Python3 sets the system default encoding to UTF-8
>>> import sys >>> sys.getdefaultencoding() 'utf-8' >>>
Then, text characters and binary data are more clearly distinguished, represented by str and bytes respectively. All text characters are represented by the str type. str can represent all characters in the Unicode character set , while binary byte data is represented by a new data type , represented by bytes.
str>>> a = "a" >>> a 'a' >>> type(a) <class 'str'> >>> b = "禅" >>> b '禅' >>> type(b) <class 'str'>
>>> c = b'a'>>> c b'a'>>> type(c) <class 'bytes'> >>> d = b'\xe7\xa6\x85'>>> d b'\xe7\xa6\x85'>>> type(d) <class 'bytes'> >>> >>> e = b'禅' File "<stdin>", line 1SyntaxError: bytes can only contain ASCII literal characters.
>>> b"a"+b"c" b'ac' >>> b"a"*2 b'aa' >>> b"abcdef\xd6"[1:] b'bcdef\xd6' >>> b"abcdef\xd6"[-1] 214 >>> b"a" + "b" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't concat bytes to str
encode is responsible for character to byte encoding conversion. By default, UTF-8 encoding is used.
>>> s = "Python之禅" >>> s.encode() b'Python\xe4\xb9\x8b\xe7\xa6\x85' >>> s.encode("gbk") b'Python\xd6\xae\xec\xf8'
>>> b'Python\xe4\xb9\x8b\xe7\xa6\x85'.decode() 'Python之禅' >>> b'Python\xd6\xae\xec\xf8'.decode("gbk") 'Python之禅'
The above is the detailed content of Detailed explanation of Python3's solution to difficult character encoding problems. For more information, please follow other related articles on the PHP Chinese website!