The execution process of a Python program includes converting source code into bytecode (i.e. compilation) and executing the bytecode

Release: 2023-05-09 16:37:09
We have to write some Python programs every day, either to process some text, or to do some system management work. After the program is written, you only need to type the python command to start the program and start executing it:

$ python some-program.py
So, how is a .py file in text form converted step by step into something that can be executed by the CPU? What about machine instructions? In addition, .pyc files may be generated during program execution. What are the functions of these files?

1. Execution process

Although Python looks more like an interpreted language like Shell script in terms of behavior, in fact, the execution principle of Python program is essentially the same as that of Java or C# and can be summarized For virtual machine and bytecode. Python executes the program in two steps: first compile the program code into bytecode, and then start the virtual machine to execute the bytecode:

Although the Python command is also called the Python interpreter , but it is fundamentally different from other scripting language interpreters. In fact, the Python interpreter consists of compiler and virtual machine. When the Python interpreter is started, it mainly performs the following two steps:

The compiler compiles the Python source code in the .py file into bytecode. The virtual machine executes the bytecode generated by the compiler line by line.

Therefore, the Python statements in the .py file are not directly converted into machine instructions, but into Python bytecode.

2. Bytecode

The compiled result of the Python program is bytecode, which contains a lot of content related to the operation of Python. Therefore, whether it is to have a deeper understanding of the operating mechanism of the Python virtual machine or to optimize the operating efficiency of the Python program, bytecode is the key content. So, what does Python bytecode look like? How can we obtain the bytecode of a Python program? Python provides a built-in function compile for instant compilation of source code. We only need to call the compile function with the source code to be compiled as a parameter to obtain the compilation result of the source code.

3. Source code compilation

Below, we compile a program through the compile function:

The source code is saved in the demo.py file:

PI = 3.14

def circle_area(r):
    return PI * r ** 2

class Person(object):
    def __init__(self, name):
        self.name = name

    def say(self):
        print('i am', self.name)
Compile Previously, the source code needed to be read from the file:

>>> text = open('D:\myspace\code\pythonCode\mix\demo.py').read()
>>> print(text)
PI = 3.14

def circle_area(r):
    return PI * r ** 2

class Person(object):
    def __init__(self, name):
        self.name = name

    def say(self):
        print('i am', self.name)
Then call the compile function to compile the source code:

>>> result = compile(text,'D:\myspace\code\pythonCode\mix\demo.py', 'exec')
There are 3 required parameters for the compile function:

source : Source code to be compiled

filename: file name where the source code is located

mode: compilation mode, exec means compiling the source code as a module

Three compilation modes:

exec: used to compile module source code

single: used to compile a single Python statement (interactively)

eval: used to compile an eval expression

4. PyCodeObject

Through the compile function, we get the final source code compilation result result:

>>> result
<code object <module> at 0x000001DEC2FCF680, file "D:\myspace\code\pythonCode\mix\demo.py", line 1>
>>> result.__class__
<class &#39;code&#39;>
Finally we get a code type object, and its corresponding underlying structure is PyCodeObject

The source code of PyCodeObject is as follows:

/* Bytecode object */
struct PyCodeObject {
    int co_argcount;            /* #arguments, except *args */
    int co_posonlyargcount;     /* #positional only arguments */
    int co_kwonlyargcount;      /* #keyword only arguments */
    int co_nlocals;             /* #local variables */
    int co_stacksize;           /* #entries needed for evaluation stack */
    int co_flags;               /* CO_..., see below */
    int co_firstlineno;         /* first source line number */
    PyObject *co_code;          /* instruction opcodes */
    PyObject *co_consts;        /* list (constants used) */
    PyObject *co_names;         /* list of strings (names used) */
    PyObject *co_varnames;      /* tuple of strings (local variable names) */
    PyObject *co_freevars;      /* tuple of strings (free variable names) */
    PyObject *co_cellvars;      /* tuple of strings (cell variable names) */
    /* The rest aren&#39;t used in either hash or comparisons, except for co_name,
       used in both. This is done to preserve the name and line number
       for tracebacks and debuggers; otherwise, constant de-duplication
       would collapse identical functions/lambdas defined on different lines.
    Py_ssize_t *co_cell2arg;    /* Maps cell vars which are arguments. */
    PyObject *co_filename;      /* unicode (where it was loaded from) */
    PyObject *co_name;          /* unicode (name, for reference) */
    PyObject *co_linetable;     /* string (encoding addr<->lineno mapping) See
                                   Objects/lnotab_notes.txt for details. */
    void *co_zombieframe;       /* for optimization only (see frameobject.c) */
    PyObject *co_weakreflist;   /* to support weakrefs to code objects */
    /* Scratch space for extra data relating to the code object.
       Type is a void* to keep the format private in codeobject.c to force
       people to go through the proper APIs. */
    void *co_extra;

    /* Per opcodes just-in-time cache
     * To reduce cache size, we use indirect mapping from opcode index to
     * cache object:
     *   cache = co_opcache[co_opcache_map[next_instr - first_instr] - 1]

    // co_opcache_map is indexed by (next_instr - first_instr).
    //  * 0 means there is no cache for this opcode.
    //  * n > 0 means there is cache in co_opcache[n-1].
    unsigned char *co_opcache_map;
    _PyOpcache *co_opcache;
    int co_opcache_flag;  // used to determine when create a cache.
    unsigned char co_opcache_size;  // length of co_opcache.
The code object PyCodeObject is used to store the compilation results, including bytecodes and constants, names, etc. involved in the code. Key fields include:

##co_argcountNumber of parametersco_kwonlyargcountNumber of keyword parametersco_nlocalsPartial Number of variablesco_stacksizeStack space required to execute the codeco_flagsIdentificationco_firstlinenoThe first line number of the code blockco_codeInstruction operation code, that is, bytecode co_constsConstant listco_namesName listco_varnamesLocal variable name list



>>> result.co_code
>>> result.co_names
(&#39;PI&#39;, &#39;circle_area&#39;, &#39;object&#39;, &#39;Person&#39;)
>>> result.co_consts
(3.14, <code object circle_area at 0x0000023D04D3F310, file "D:\myspace\code\pythonCode\mix\demo.py", line 3>, &#39;circle_area&#39;, <code object Person at 0x0000023D04D3F5D0, file "D:\myspace\code\pythonCode\mix\demo.py", line 6>, &#39;Person&#39;, None)
>>> person_code = result.co_consts[3]
>>> person_code
<code object Person at 0x0000023D04D3F5D0, file "D:\myspace\code\pythonCode\mix\demo.py", line 6>
>>> person_code.co_consts
(&#39;Person&#39;, <code object __init__ at 0x0000023D04D3F470, file "D:\myspace\code\pythonCode\mix\demo.py", line 7>, &#39;Person.__init__&#39;, <code object say at 0x0000023D04D3F520, file "D:\myspace\code\pythonCode\mix\demo.py", line 10>, &#39;Person.say&#39;, None)
5. 反编译



>>> import dis
>>> dis.dis(result.co_code)
 0 LOAD_CONST               0 (0)
 2 STORE_NAME               0 (0)
 4 LOAD_CONST               1 (1)
 6 LOAD_CONST               2 (2)
 8 MAKE_FUNCTION            0
10 STORE_NAME               1 (1)
14 LOAD_CONST               3 (3)
16 LOAD_CONST               4 (4)
18 MAKE_FUNCTION            0
20 LOAD_CONST               4 (4)
22 LOAD_NAME                2 (2)
24 CALL_FUNCTION            3
26 STORE_NAME               3 (3)
28 LOAD_CONST               5 (5)
>>> result.co_consts[0]3.14
  1           0 LOAD_CONST               0 (3.14)
              2 STORE_NAME               0 (PI)

  3           4 LOAD_CONST               1 (<code object circle_area at 0x0000023D04D3F310, file "D:\myspace\code\pythonCode\mix\demo.py", line 3>)
              6 LOAD_CONST               2 (&#39;circle_area&#39;)
              8 MAKE_FUNCTION            0
             10 STORE_NAME               1 (circle_area)

  6          12 LOAD_BUILD_CLASS
             14 LOAD_CONST               3 (<code object Person at 0x0000023D04D3F5D0, file "D:\myspace\code\pythonCode\mix\demo.py", line 6>)
             16 LOAD_CONST               4 (&#39;Person&#39;)
             18 MAKE_FUNCTION            0
             20 LOAD_CONST               4 (&#39;Person&#39;)
             22 LOAD_NAME                2 (object)
             24 CALL_FUNCTION            3
             26 STORE_NAME               3 (Person)
             28 LOAD_CONST               5 (None)
             30 RETURN_VALUE

Disassembly of <code object circle_area at 0x0000023D04D3F310, file "D:\myspace\code\pythonCode\mix\demo.py", line 3>:
  4           0 LOAD_GLOBAL              0 (PI)
              2 LOAD_FAST                0 (r)
              4 LOAD_CONST               1 (2)
              6 BINARY_POWER
              8 BINARY_MULTIPLY
             10 RETURN_VALUE

Disassembly of <code object Person at 0x0000023D04D3F5D0, file "D:\myspace\code\pythonCode\mix\demo.py", line 6>:
  6           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 (&#39;Person&#39;)
              6 STORE_NAME               2 (__qualname__)

  7           8 LOAD_CONST               1 (<code object __init__ at 0x0000023D04D3F470, file "D:\myspace\code\pythonCode\mix\demo.py", line 7>)
             10 LOAD_CONST               2 (&#39;Person.__init__&#39;)
             12 MAKE_FUNCTION            0
             14 STORE_NAME               3 (__init__)

 10          16 LOAD_CONST               3 (<code object say at 0x0000023D04D3F520, file "D:\myspace\code\pythonCode\mix\demo.py", line 10>)
             18 LOAD_CONST               4 (&#39;Person.say&#39;)
             20 MAKE_FUNCTION            0
             22 STORE_NAME               4 (say)
             24 LOAD_CONST               5 (None)
             26 RETURN_VALUE

Disassembly of <code object __init__ at 0x0000023D04D3F470, file "D:\myspace\code\pythonCode\mix\demo.py", line 7>:
  8           0 LOAD_FAST                1 (name)
              2 LOAD_FAST                0 (self)
              4 STORE_ATTR               0 (name)
              6 LOAD_CONST               0 (None)
              8 RETURN_VALUE

Disassembly of <code object say at 0x0000023D04D3F520, file "D:\myspace\code\pythonCode\mix\demo.py", line 10>:
 11           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 (&#39;i am&#39;)
              4 LOAD_FAST                0 (self)
              6 LOAD_ATTR                1 (name)
              8 CALL_FUNCTION            2
             10 POP_TOP
             12 LOAD_CONST               0 (None)
             14 RETURN_VALUE
操作数指定的常量或名字的实际值在旁边的括号内列出,此外,字节码以语句为单位进行了分组,中间以空行隔开,语句的行号在字节码前面给出。例如PI = 3.14这个语句就被会变成了两条字节码:

  1           0 LOAD_CONST               0 (3.14)
              2 STORE_NAME               0 (PI)
6. pyc


>>> import demo
The execution process of a Python program includes converting source code into bytecode (i.e. compilation) and executing the bytecode



