This chapter introduces the writing and processing of Python strings. Before, after we figured out the troublesome character encoding issue, we will study Python string.
In the latest Python 3 version, strings are encoded in Unicode, which means that Python strings support multiple languages, for example:
>>> print('包含中文的str') 包含中文的str
For the encoding of a single character, Python provides the ord() function to obtain the integer representation of the character, chr()The function converts the encoding into the corresponding character:
>>> ord('A') 65 >>> ord('中') 20013 >>> chr(66) 'B' >>> chr(25991) '文'
If you know the integer encoding of the character, you can also write str in hexadecimal:
>>> '\u4e2d\u6587' '中文'
The two writing methods are completely equivalent. .
Since Python's string type is str, which is represented by Unicode in memory, and one character corresponds to several bytes. If you want to transmit over the network or save to disk, you need to change str into bytes in bytes.
Python uses single quotes or double quotes with b prefix for bytes type data:
x = b'ABC'
Be careful to distinguish between 'ABC' and b'ABC'. The former is str, while the latter is The content is displayed the same as the former, but each character of bytes only occupies one byte.
Str expressed in Unicode can be encoded into specified bytes through the encode() method, for example:
>>> 'ABC'.encode('ascii') b'ABC' >>> '中文'.encode('utf-8') b'\xe4\xb8\xad\xe6\x96\x87' >>> '中文'.encode('ascii') Traceback (most recent call last): File "<stdin>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
Pure English str can be encoded into bytes using ASCII, the content is the same, including Chinese The str can be encoded into bytes using UTF-8. Str containing Chinese cannot be encoded in ASCII, because the range of Chinese encoding exceeds the range of ASCII encoding, and Python will report an error.
In bytes, bytes that cannot be displayed as ASCII characters are displayed with \x##.
Conversely, if we read a byte stream from the network or disk, the data read is bytes. To change bytes into str, you need to use the decode() method:
>>> b'ABC'.decode('ascii') 'ABC' >>> b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8') '中文'
If bytes contains bytes that cannot be decoded, the decode() method will report an error:
>>> b'\xe4\xb8\xad\xff'.decode('utf-8') Traceback (most recent call last): ... UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 3: invalid start byte
The above is Python String programming issues
The above is the detailed content of Things to do with Python strings. For more information, please follow other related articles on the PHP Chinese website!