Coding secrets (python version)
Coding (python version)
Recently, in the process of learning python, I was a little confused by the different encodings, so I read the documents left by my predecessors. , add your own understanding, prepare to write it down, and share it with you who are struggling with coding.
The concept of encoding
Encoding is to convert information from one format to another format. Computers only understand binary, simple understanding , converting the text seen by our eyes into a binary format that can be recognized by the computer is regarded as encoding, and the process of converting binary into text that we can see in a certain encoding format can be regarded as decoding. Since computers can only recognize binary numbers 0 and 1, how do the letters, numbers, and words we use correspond to them? Then please continue reading!
The default encoding specification viewed in python is:
import sysprint(sys.getdefaultencoding())#运行结果:utf-8
ASCⅡ code
We all know that the computer was invented by the United States. At first, it was only used by those countries in the United States, and their language only consisted of 26 letters, plus some symbols, so at the beginning, the encoding used The rule is ASCII code. ASCⅡ, the Chinese name is American Standard Code for Information Interchange, because it is called American Standard Code for Information Interchange. Let’s take a look at the ASCⅡ table:
ASCII code uses one byte, which is an 8-bit binary group, to identify a character. For example, 00100001 represents the character ! , the first version of ASCII did not use the highest bit, so the value range is 0-127, which can only represent 128 characters. In order to meet the character requirements of Western Europe and other countries, the highest bit was used, and the number of characters that could be represented was increased from 128 to 256.
Use the function ord() in python to convert characters into corresponding numerical values, and use the function chr to convert numerical values into corresponding characters:
>>> ord("a") #将字符转换为数值97>>> ord("A")65>>> chr(65)'A'>>> chr(97) #将数值转换为字符'a'>>>
GB2312和GBK
当计算机漂洋过海来到了中国,ASCⅡ已经不能满足我大天朝的需求了,常用的汉字大致都有2k-3k。所以中国国家标准总局在1980发布了《信息交换用汉字编码字符集》,也就是GB2313标准。GB2312一共收录了7445个字符(6763个汉字和682个其他符号),包括拉丁字母、希腊字母和日文平假名等,基本上满足了国人的需求。
在GB2312中每个汉字使用两个字节来表示,分为高字节和低字节,汉字区高字节从B0-F7,低字节从A1-FE,占用的码位是72*94=6768,其中有5个空位是D7FA-D7FE,规定第一个字节大于127的就代表这是一个汉字的开始(这一个字节和下一个字节就代表一个汉字),每个字节的最高位都位1。
但是对于人名、古汉语等方面出现的罕用字,GB2312不能处理,后来就出现了GBK。GBK向下兼容GB2312,其编码范围从8140到FEFE(不包括xx7F),共23940个码位,共收录了21003个汉字,这还是很厉害的了。现在我们使用的计算机默认的就是GBK编码。
Unicode和UTF-8
我们国家搞出了GBK,其他的国家也搞出了各种各样的编码,比如小日本的SJIJ,宝岛台湾的BIG5,国际组织一看,这不行啊,每个地方都各自搞各自的,那么在不同的国家之间就会出现不兼容,我用GBK编码格式写的软件,弄到你编码格式为SJIJ的计算机就不能执行了。所以就出现了Unicode,也称万国码。unicode是用2个字节来表示一个字符的,65536类个字符,这足以覆盖世界上所有的文字。
这样虽好,但是美国人民就不开心了,我一个字母,比如'a'就需要占用一个字节,现在需要占用两个字节,这样就大大的浪费了内存和硬盘的空间,所有后来就出现了UTF-32,UTF-16和UTF-8,前两个这里就不在敖述了,现在并不常用,我们这看看这个UTF-8,UTF-8是一种可变长的编码格式,存储英文字母只需要一个字节,存储汉字需要3个字节,但超大字符集中的更大多数汉字要占4个字节。我们在内存里面的数据是unicode,在传输数据和保存数据的时候使用UTF-8已节省空间和带宽。
Python2的编码
在python2中默认的编码是ASCII,python2的字符串类型有两种:str和Unicode,这两个只是字符串类型的名字,我们主要看它们在内存里面的内存地址:
= = u repr(name2) #输出结果
'\xe5\xbd\xac\xe5\xbd\xac' #字节数据
u'\u5f6c\u5f6c' #Unicode数据
在python2中,str类型字符串类型在内存中存储的是bytes数据,Unicode类型字符串在内存中存储的是unicode数据。那两种数据之间是什么关系了?这里就涉及到了解码(encode)和编码(decode)了。
= name2 = u = name.decode(= name2.encode(<type ><type >
由上运行结果可知,unicode转换为bytes数据的过程是编码。从bytes数据转换为unicode数据的过程是解码。我们再来看一下:
#coding=utf8name = '彬彬'name3 = name.decode('big5')print name3#运行结果敶砍蓮
我们可以看到得到一堆乱文,name存在内存里的时候是以UTF编码成的bytes数据,而我们这里decode('big5')使用big5来解码,虽然成功了,但是输出结果却不是我们想要的结果。
当我们把第一行coding改为big5的时候就不会出现乱文了,
#coding=big5name = '彬彬'name3 = name.decode('big5')print name3#运行结果彬彬
所以我们用什么规则编码的就要用什么区解码!
注意:我们在终端显示出来的明文,就是你用户所看到的,其实都是已经转换成unicode到内存里面,而bytes数据一般都是计算机识别的。
Python3的编码
在Python3中也定义了2种类型的字符串类型,str和bytes,str类型存储unicode数据,bytes类型存储bytes数据。
name = "彬彬"name2 = b"hello"print(type(name))print(type(name.encode('utf8')))print(type(name.encode('gbk')))print(type(name2))print(type(name2.decode('utf8')))#运行结果<class 'str'> <class 'bytes'> <class 'bytes'> <class 'bytes'> <class 'str'>
如上运行结果,bytes转换为unicode为解码,uicode转为bytes数据类型为编码。
由上图所示,在不同的编码之间转换的时候,我们都要经过unicode这个中转站,没办法,虽然unicode老大哥强大呢,当我们想把utf-8编码的数据转换为gbk的,我们就需要把utf-8的数据先解码成unicode,再由unicode编码成gbk。
在py2和py3中有个重要的区分就是,py2会自动把bytes数据解码成unicode,而py3就不会自动把bytes解码成unicode了。所以说py3更清晰的区分了bytes数据和unicode。
#py2中print(u"liu" + "bin")#运行结果liubin
( + b, line 2, <module>( + bbytes
print("liu" + (b"bin").decode('utf8')) #运行结果
liubin
一个.py文件的"一生"
那我们创建.py文件,到执行.py文件,这里面的编码和解码是怎么来的呢?
1.当我们创建一个.py文件的时候,会有一个默认的编码格式(这里以pycharm为例),在右下角,默认是UTF-8,当然你也可以选择其他的编码:
2.当我们在.py文件里面写入代码的时候,会以unicode的编码格式保存在内存中;
print("你好,世界!")
3.当我们保存的时候,会将Unicode数据编码成utf-8格式的数据,然后保存在硬盘里面;
4.当我们执行文件的时候,pycharm会调用python的解释器来读取文件,在py2中,默认会以ASCII将代码解码成unicode数据,但是ASCII码并不认识中文,所以就会出现报错。
File "E:/py/�ַ�����.py", line 2SyntaxError: Non-ASCII character '\xe4' in file E:/py/�ַ�����.py on line 2, but no encoding declared; see for details
所以,在py2中,我们需要加上:
#coding=utf8print("你好,世界!")#运行结果你好,世界!
但是在py3中就不存在这个问题了,只要编码的时候适用的是UTF-8,python3默认的编码规范就是UTF-8,它会用UTF-8来将UTF-8的bytes数据解码成unicode,然后在计算机终端显示!
The above is the detailed content of Coding secrets (python version). For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

The speed of mobile XML to PDF depends on the following factors: the complexity of XML structure. Mobile hardware configuration conversion method (library, algorithm) code quality optimization methods (select efficient libraries, optimize algorithms, cache data, and utilize multi-threading). Overall, there is no absolute answer and it needs to be optimized according to the specific situation.

An application that converts XML directly to PDF cannot be found because they are two fundamentally different formats. XML is used to store data, while PDF is used to display documents. To complete the transformation, you can use programming languages and libraries such as Python and ReportLab to parse XML data and generate PDF documents.

It is impossible to complete XML to PDF conversion directly on your phone with a single application. It is necessary to use cloud services, which can be achieved through two steps: 1. Convert XML to PDF in the cloud, 2. Access or download the converted PDF file on the mobile phone.

To generate images through XML, you need to use graph libraries (such as Pillow and JFreeChart) as bridges to generate images based on metadata (size, color) in XML. The key to controlling the size of the image is to adjust the values of the <width> and <height> tags in XML. However, in practical applications, the complexity of XML structure, the fineness of graph drawing, the speed of image generation and memory consumption, and the selection of image formats all have an impact on the generated image size. Therefore, it is necessary to have a deep understanding of XML structure, proficient in the graphics library, and consider factors such as optimization algorithms and image format selection.

There is no built-in sum function in C language, so it needs to be written by yourself. Sum can be achieved by traversing the array and accumulating elements: Loop version: Sum is calculated using for loop and array length. Pointer version: Use pointers to point to array elements, and efficient summing is achieved through self-increment pointers. Dynamically allocate array version: Dynamically allocate arrays and manage memory yourself, ensuring that allocated memory is freed to prevent memory leaks.

Use most text editors to open XML files; if you need a more intuitive tree display, you can use an XML editor, such as Oxygen XML Editor or XMLSpy; if you process XML data in a program, you need to use a programming language (such as Python) and XML libraries (such as xml.etree.ElementTree) to parse.

XML can be converted to images by using an XSLT converter or image library. XSLT Converter: Use an XSLT processor and stylesheet to convert XML to images. Image Library: Use libraries such as PIL or ImageMagick to create images from XML data, such as drawing shapes and text.

To convert XML images, you need to determine the XML data structure first, then select a suitable graphical library (such as Python's matplotlib) and method, select a visualization strategy based on the data structure, consider the data volume and image format, perform batch processing or use efficient libraries, and finally save it as PNG, JPEG, or SVG according to the needs.
