We have to write some Python programs every day, either to process some text, or to do some system management work. After the program is written, you only need to type the python command to start the program and start executing it:
$ python some-program.py
So, how is a .py file in text form converted step by step into something that can be executed by the CPU? What about machine instructions? In addition, .pyc files may be generated during program execution. What are the functions of these files?
Although Python looks more like an interpreted language like Shell script in terms of behavior, in fact, the execution principle of Python program is essentially the same as that of Java or C# and can be summarized For virtual machine and bytecode. Python executes the program in two steps: first compile the program code into bytecode, and then start the virtual machine to execute the bytecode:
Although the Python command is also called the Python interpreter , but it is fundamentally different from other scripting language interpreters. In fact, the Python interpreter consists of compiler and virtual machine. When the Python interpreter is started, it mainly performs the following two steps:
The compiler compiles the Python source code in the .py file into bytecode. The virtual machine executes the bytecode generated by the compiler line by line.
Therefore, the Python statements in the .py file are not directly converted into machine instructions, but into Python bytecode.
The compiled result of the Python program is bytecode, which contains a lot of content related to the operation of Python. Therefore, whether it is to have a deeper understanding of the operating mechanism of the Python virtual machine or to optimize the operating efficiency of the Python program, bytecode is the key content. So, what does Python bytecode look like? How can we obtain the bytecode of a Python program? Python provides a built-in function compile for instant compilation of source code. We only need to call the compile function with the source code to be compiled as a parameter to obtain the compilation result of the source code.
Below, we compile a program through the compile function:
The source code is saved in the demo.py file:
PI = 3.14 def circle_area(r): return PI * r ** 2 class Person(object): def __init__(self, name): self.name = name def say(self): print('i am', self.name)
Compile Previously, the source code needed to be read from the file:
>>> text = open('D:\myspace\code\pythonCode\mix\demo.py').read() >>> print(text) PI = 3.14 def circle_area(r): return PI * r ** 2 class Person(object): def __init__(self, name): self.name = name def say(self): print('i am', self.name)
Then call the compile function to compile the source code:
>>> result = compile(text,'D:\myspace\code\pythonCode\mix\demo.py', 'exec')
There are 3 required parameters for the compile function:
source : Source code to be compiled
filename: file name where the source code is located
mode: compilation mode, exec means compiling the source code as a module
exec: used to compile module source code
single: used to compile a single Python statement (interactively)
eval: used to compile an eval expression
Through the compile function, we get the final source code compilation result result:
>>> result <code object <module> at 0x000001DEC2FCF680, file "D:\myspace\code\pythonCode\mix\demo.py", line 1> >>> result.__class__ <class 'code'>
Finally we get a code type object, and its corresponding underlying structure is PyCodeObject
The source code of PyCodeObject is as follows:
/* Bytecode object */ struct PyCodeObject { PyObject_HEAD int co_argcount; /* #arguments, except *args */ int co_posonlyargcount; /* #positional only arguments */ int co_kwonlyargcount; /* #keyword only arguments */ int co_nlocals; /* #local variables */ int co_stacksize; /* #entries needed for evaluation stack */ int co_flags; /* CO_..., see below */ int co_firstlineno; /* first source line number */ PyObject *co_code; /* instruction opcodes */ PyObject *co_consts; /* list (constants used) */ PyObject *co_names; /* list of strings (names used) */ PyObject *co_varnames; /* tuple of strings (local variable names) */ PyObject *co_freevars; /* tuple of strings (free variable names) */ PyObject *co_cellvars; /* tuple of strings (cell variable names) */ /* The rest aren't used in either hash or comparisons, except for co_name, used in both. This is done to preserve the name and line number for tracebacks and debuggers; otherwise, constant de-duplication would collapse identical functions/lambdas defined on different lines. */ Py_ssize_t *co_cell2arg; /* Maps cell vars which are arguments. */ PyObject *co_filename; /* unicode (where it was loaded from) */ PyObject *co_name; /* unicode (name, for reference) */ PyObject *co_linetable; /* string (encoding addr<->lineno mapping) See Objects/lnotab_notes.txt for details. */ void *co_zombieframe; /* for optimization only (see frameobject.c) */ PyObject *co_weakreflist; /* to support weakrefs to code objects */ /* Scratch space for extra data relating to the code object. Type is a void* to keep the format private in codeobject.c to force people to go through the proper APIs. */ void *co_extra; /* Per opcodes just-in-time cache * * To reduce cache size, we use indirect mapping from opcode index to * cache object: * cache = co_opcache[co_opcache_map[next_instr - first_instr] - 1] */ // co_opcache_map is indexed by (next_instr - first_instr). // * 0 means there is no cache for this opcode. // * n > 0 means there is cache in co_opcache[n-1]. unsigned char *co_opcache_map; _PyOpcache *co_opcache; int co_opcache_flag; // used to determine when create a cache. unsigned char co_opcache_size; // length of co_opcache. };
The code object PyCodeObject is used to store the compilation results, including bytecodes and constants, names, etc. involved in the code. Key fields include:
Field | Purpose |
---|---|
Number of parameters | |
Number of keyword parameters | |
Partial Number of variables | |
Stack space required to execute the code | |
Identification | |
The first line number of the code block | |
Instruction operation code, that is, bytecode | |
Constant list | |
Name list | |
Local variable name list |