Things to do with Python strings

Tomorin
Release: 2018-08-23 17:47:29
Original
1373 people have browsed it

This chapter introduces the writing and processing of Python strings. Before, after we figured out the troublesome character encoding issue, we will study Python string.

In the latest Python 3 version, strings are encoded in Unicode, which means that Python strings support multiple languages, for example:

>>> print('包含中文的str')
包含中文的str
Copy after login

For the encoding of a single character, Python provides the ord() function to obtain the integer representation of the character, chr()The function converts the encoding into the corresponding character:

>>> ord('A')
65
>>> ord('中')
20013
>>> chr(66)
'B'
>>> chr(25991)
'文'
Copy after login

If you know the integer encoding of the character, you can also write str in hexadecimal:

>>> '\u4e2d\u6587'
'中文'
Copy after login

The two writing methods are completely equivalent. .

Since Python's string type is str, which is represented by Unicode in memory, and one character corresponds to several bytes. If you want to transmit over the network or save to disk, you need to change str into bytes in bytes.

Python uses single quotes or double quotes with b prefix for bytes type data:

x = b'ABC'
Copy after login

Be careful to distinguish between 'ABC' and b'ABC'. The former is str, while the latter is The content is displayed the same as the former, but each character of bytes only occupies one byte.

Str expressed in Unicode can be encoded into specified bytes through the encode() method, for example:

>>> 'ABC'.encode('ascii')
b'ABC'
>>> '中文'.encode('utf-8')
b'\xe4\xb8\xad\xe6\x96\x87'
>>> '中文'.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>UnicodeEncodeError:
   &#39;ascii&#39; codec can&#39;t encode characters in position 0-1: ordinal not in range(128)
Copy after login

Pure English str can be encoded into bytes using ASCII, the content is the same, including Chinese The str can be encoded into bytes using UTF-8. Str containing Chinese cannot be encoded in ASCII, because the range of Chinese encoding exceeds the range of ASCII encoding, and Python will report an error.

In bytes, bytes that cannot be displayed as ASCII characters are displayed with \x##.

Conversely, if we read a byte stream from the network or disk, the data read is bytes. To change bytes into str, you need to use the decode() method:

>>> b&#39;ABC&#39;.decode(&#39;ascii&#39;)
&#39;ABC&#39;
>>> b&#39;\xe4\xb8\xad\xe6\x96\x87&#39;.decode(&#39;utf-8&#39;)
&#39;中文&#39;
Copy after login

If bytes contains bytes that cannot be decoded, the decode() method will report an error:

>>> b&#39;\xe4\xb8\xad\xff&#39;.decode(&#39;utf-8&#39;)
Traceback (most recent call last):
  ...
UnicodeDecodeError: &#39;utf-8&#39; codec can&#39;t decode byte 0xff in position 3: invalid start byte
Copy after login

The above is Python String programming issues

The above is the detailed content of Things to do with Python strings. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template