Characters
ASCII |
Unicode |
UTF-8 |
|
A
01000001 |
00000000 01000001 |
01000001 |
|
中
x |
01001110 00101101 |
11100100 10111000 10101101 |
|
1) If the text you want to transmit contains a large number of English characters, encoding with UTF-8 can save space;
2) ASCII encoding can actually be regarded as part of the UTF-8 encoding. Therefore, a large number of only support ASCII encoding. Legacy software can continue to work under UTF-8 encoding.
Common character encoding working methods for computer systems:
Memory: UnifiedunicodeEncoding
Hard disk, transmission: Convert to utf-8
When browsing the web, the server will convert the dynamically generated Unicode content into UTF-8 Then transmit it to the browser.
Python string
Related functions
ord()FunctionGets the integer representation of the character (single character). The parameter is the single character to be operated on, and an integer is returned.
chr()FunctionConvert the encoding to the corresponding character (single character)
encode() function , converts the str string to the specified encoding The method (parameter) becomes bytes
'str'.encode (ascii/utf-8) Return bytesString
Chinese encoding with ascii will report an error
'bytes'.decode(ascii/utf-8) returns str string
bytes cannot be decoded and an error will be reported, If there are only a small number of invalid bytes in bytes, you can pass in errors='ignore'Ignore the wrong bytes
> >> b'\xe4\xb8\xad\xff'.decode('utf-8', errors='ignore') '中'
In the latest Python 3 version, strings are encoded in Unicode, that is, ## The string of #Python supports multiple languages The string type of Python is
str,If you want to transmit it on the network, or save it to On the disk, you need to change str to bytes. >>In order to avoid garbled characters, you should always stick to using UTF-8 encoding for
str## and bytesConvert<<The difference between str and bytes
##>>> 'ABC'.encode('ascii') b'ABC '>>> 'Chinese'.encode('utf-8') b'\xe4\xb8\xad\xe6\x96\x87'
在bytes, bytes that cannot be displayed as ASCII characters are displayed with \x##.
.py file contains Chinese characters. utf-8 encoding#!/usr/bin/env python3 # -*- coding: utf-8 -*-
The first line of comments is to tell Linux/OS X system, this is an Python executable program, Windows system will ignore this comment;
The second line of comments is to tell the Python interpreter to read the source code according to the UTF-8 encoding, otherwise, The Chinese output you write in the source code may be garbled.
>>The editor uses UTF-8 without BOM<<
String formattingProblem##>>> 'Hello, %s' % 'world' 'Hello, world' >> ;> 'Hi, %s, you have $%d.' % ('Michael', 1000000) 'Hi, Michael, you have $1000000.'
%
Operator
# is used to format strings. There are several - %?
placeholders inside the string, followed by several variables or values, and the order must be consistent. If there is only one %?, the brackets can be omitted.
Escape, use - %%
to represent a %
> ;>> 'growth rate: %d %%' % 7 'growth rate: 7 %'
##PlaceholderReplacement content
|
%d |
Integer
|
%f |
Floating point number
|
%s |
String
|
%x |
Hexadecimal integer
|
|
format()
Another way to format a string is to use the string's format()
method, which will use the passed in The parameters replace the placeholders {0}, {1}... in the string in sequence, but this way of writing is easier than % Much more troublesome: >>> 'Hello, {0}, the score has improved by {1:.1f}%'.format('Xiao Ming', 17.125) ' Hello, Xiao Ming, your score has improved by 17.1%'