What is the multi-threading performance of Python GIL? An in-depth explanation of GIL-Python Tutorial-php.cn

Foreword: Bloggers often heard the word GIL when they first came into contact with Python, and found that this word was often equated with Python's inability to efficiently implement multi-threading. In line with the research attitude of not only knowing what is happening, but also knowing why it is happening, the blogger collected all kinds of information, spent a few hours of his free time within a week to deeply understand GIL, and summarized it into this article. I also hope that Readers can better and objectively understand GIL through this article.

What is GIL

The first thing that needs to be made clear is that GIL is not a feature of Python, it implements the Python parser (CPython) a concept introduced at the time. Just like C++ is a set of language (grammar) standards, but it can be compiled into executable code using different compilers. Famous compilers such as GCC, INTEL C++, Visual C++, etc. The same is true for Python. The same piece of code can be executed through different Python execution environments such as CPython, PyPy, and Psyco. For example, JPython does not have GIL. However, CPython is the default Python execution environment in most environments. Therefore, in the concept of many people, CPython is Python, and they take it for granted that GIL is attributed to the flaws of the Python language. So let’s make it clear here: GIL is not a feature of Python. Python does not need to rely on GIL at all

So what is GIL in CPython implementation? The full name of GILGlobal Interpreter LockIn order to avoid misleading, let’s take a look at the official explanation:

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

Okay, doesn’t it look bad? A Mutex that prevents multiple threads from executing machine code concurrently appears to be a global lock that looks like a bug at first glance! Don't worry, we will analyze it slowly below.

Why is there GIL

Due to physical limitations, the competition between CPU manufacturers in core frequency has been replaced by multi-core. In order to more effectively utilize the performance of multi-core processors, multi-threaded programming methods have emerged, and with them are the difficulties of data consistency and state synchronization between threads. Even the Cache inside the CPU is no exception. In order to effectively solve the data synchronization between multiple cache, each manufacturer has spent a lot of effort, which inevitably brings certain consequences. performance loss.

Of course Python cannot escape. In order to take advantage of multi-core, Python began to support multi-threading. The simplest way to solve data integrity and status synchronization between multiple threads is naturally to lock. So there is the super big lock of GIL, and when more and more code base developers accept this setting, they begin to rely heavily on this feature (that is, the default pythoninternal objectIt is thread-safe, no need to consider additional memory locks and synchronization operations during implementation).

Slowly this implementation method was found to be painful and inefficient. But when everyone tried to split and remove the GIL, they found that a large number of library code developers have relied heavily on the GIL and it is very difficult to remove it. How difficult is it? To make an analogy, a "small project" like MySQL took nearly 5 years to split the big lock of Buffer Pool Mutex into various small locks, from 5.5 to 5.6 to 5.7. years and still going on. MySQL, a product with company support and a fixed development team behind it, is having such a difficult time, not to mention a highly community-based team of core developers and code contributors like Python?

So to put it simply, the existence of GIL is more for historical reasons. If we had to do it all over again, we would still have to face the problem of multi-threading, but at least it would be more elegant than the current GIL approach.

The impact of GIL

From the above introduction and official definition, GIL is undoubtedly a global exclusive lock. There is no doubt that the existence of global locks will have a great impact on the efficiency of multi-threading. It's almost as if Python is a single-threaded program. Then readers will say that as long as the global lock is released, the efficiency will not be bad. As long as the GIL can be released when performing time-consuming IO operations, operating efficiency can still be improved. In other words, no matter how bad it is, it will not be worse than the efficiency of single thread. This is true in theory, but in practice? Python is worse than you think.

Below we will compare the efficiency of Python in multi-threading and single-threading. The test method is very simple, a loop 100 million times counter function . One is executed twice through a single thread, and one is executed through multiple threads. Finally compare the total execution time. The test environment is a dual-core Mac pro. Note: In order to reduce the impact of the performance loss of the thread library itself on the test results, the single-threaded code here also uses threads. Just execute it twice sequentially to simulate a single thread.

Single thread executed sequentially (single_thread.py)
#! /usr/bin/pythonfrom threading import Threadimport timedef my_counter():    i = 0    for _ in range(100000000):        i = i + 1    return Truedef main():    thread_array = {}    start_time = time.time()    for tid in range(2):        t = Thread(target=my_counter)        t.start()        t.join()    end_time = time.time()    print("Total time: {}".format(end_time - start_time))if name == 'main':    main()
Copy after login
Two concurrent threads executed simultaneously (multi_thread.py)
#! /usr/bin/pythonfrom threading import Threadimport timedef my_counter():    i = 0    for _ in range(100000000):        i = i + 1    return Truedef main():    thread_array = {}    start_time = time.time()    for tid in range(2):        t = Thread(target=my_counter)        t.start()        thread_array[tid] = t    for i in range(2):        thread_array[i].join()    end_time = time.time()    print("Total time: {}".format(end_time - start_time))if name == 'main':    main()
Copy after login
The picture below It is the test results

#It can be seen that python is actually 45% slower than single thread in the case of multi-threading. According to the previous analysis, even with the existence of GIL global lock, serialized multi-threading should have the same efficiency as single-threading. So how could there be such a bad result?

Let us analyze the reasons for this through the implementation principles of GIL.

Defects in the current GIL design

Scheduling method based on the number of pcodes

According to the ideas of the Python community, the thread scheduling of the operating system itself is already very mature Once it’s stable, there’s no need to create your own. So a Python thread is a pthread in C language, and is scheduled through the operating system scheduling algorithm (for example, linux is CFS). In order to allow each thread to utilize CPU time evenly, Python will calculate the number of currently executed microcodes and force the GIL to be released when it reaches a certain threshold. At this time, the operating system's thread scheduling will also be triggered (of course, whether the context switch is actually performed is determined by the operating system).

Pseudocode
while True:    acquire GIL    for i in 1000:        do something    release GIL    /* Give Operating System a chance to do thread scheduling */
Copy after login
This mode has no problem when there is only one CPU core. Any thread can successfully obtain the GIL when it is awakened (because thread scheduling will only occur when the GIL is released). But when the CPU has multiple cores, problems arise. As you can see from the pseudocode, there is almost no gap between release GIL and acquire GIL. So when other threads on other cores are awakened, in most cases the main thread has acquired the GIL again. At this time, the thread that is awakened for execution can only waste CPU time in vain, watching another thread happily execute with GIL. Then after reaching the switching time, it enters the waiting state, is awakened again, and waits again, thus repeating the vicious cycle.

PS: Of course, this implementation is primitive and ugly. The interaction between GIL and thread scheduling is gradually improved in each version of Python. For example, first try to hold the GIL while doing thread context switching, release the GIL while waiting for IO, etc. But what cannot be changed is that the existence of GIL makes the already expensive operation of operating system thread scheduling more luxurious. Extended reading on the impact of GIL

In order to intuitively understand the performance impact of GIL on multi-threading, here is a test result chart directly borrowed (see the figure below). The figure shows the execution of two threads on a dual-core CPU. Both threads are CPU-intensive computing threads. The green part indicates that the thread is running and performing useful calculations. The red part indicates the time the thread was scheduled to wake up, but was unable to obtain the GIL and was unable to perform effective calculations. As can be seen from the figure, the existence of GIL causes multi-threading to be unable to fully utilize the concurrent processing capabilities of multi-core CPUs.

So can Python’s IO-intensive threads benefit from multi-threading? Let’s take a look at the test results below. The meaning of the colors is the same as in the picture above. The white part indicates that the IO thread is waiting. It can be seen that when the IO thread receives the data packet and causes the terminal to switch, it is still unable to obtain the GIL lock due to the existence of a CPU-intensive thread, resulting in an endless loop of waiting.

A simple summary is: Python's multi-threading on multi-core CPUs only has a positive effect on IO-intensive calculations; when there is at least one CPU-intensive thread, the multi-thread efficiency Will drop significantly due to GIL.

How to avoid being affected by GIL

Having said so much, if I don’t mention the solution, it is just a popular science post, but it is useless. GIL is so bad, is there a way around it? Let’s take a look at what solutions are available.

Use multiprocessing to replace Thread

The emergence of the multiprocessing library is largely to make up for the inefficiency of the thread library due to GIL. It completely replicates a set of interfaces provided by thread to facilitate migration. The only difference is that it uses multiple processes instead of multiple threads. Each process has its own independent GIL, so there will be no GIL contention between processes.

Of course multiprocessing is not a panacea. Its introduction will increase the difficulty of data communication and synchronization between time threads in the program. Take the counter as an example. If we want multiple threads to accumulate the same variable, for thread, declare a global variable and wrap three lines with the thread.Lock context. In multiprocessing, since processes cannot see each other's data, they can only declare a Queue in the main thread, put it and then get it, or use shared memory. This additional implementation cost makes coding multi-threaded programs, which is already very painful, even more painful. Where are the specific difficulties? Interested readers can further read this article

Use other parsers

As mentioned before, since GIL is only a product of CPython, are other parsers better? ? Yes, parsers like JPython and IronPython do not require the help of the GIL due to the nature of their implementation languages. However, by using Java/C# for the parser implementation, they also lost the opportunity to take advantage of the many useful features of the community's C language modules. So these parsers have always been relatively niche. After all, everyone will choose the former over function and performance in the early stage. Done is better than perfect.

So it’s hopeless?

Of course, the Python community is also working very hard to continuously improve the GIL, and even try to remove the GIL. And there have been a lot of improvements in each minor version. Interested readers can further read this Slide. Another improvement is Reworking the GIL - changing the switching granularity from opcode counting to time slice counting - to avoid the thread that recently released the GIL lock from being immediately scheduled again - NewThreadPriorityFunction (high-priority threads can force other threads to release the GIL locks they hold)

Summary

Python GIL is actually a combination of functionality and performance It is the product of a trade-off between time and space, especially the rationality of its existence, and it also has objective factors that are difficult to change. From the analysis of this part, we can make the following simple conclusions: - Because of the existence of GIL, only multiple threads will get better performance in the IO Bound scenario - If you want to program with high parallel computing performance, you can consider using the core Some of them are also turned into C modules, or simply implemented in other languages - GIL will continue to exist for a long time, but it will continue to be improved

Reference

Python's hardest problem Official documents about GIL Revisiting thread priorities and the new GIL

The above is the detailed content of What is the multi-threading performance of Python GIL? An in-depth explanation of GIL. For more information, please follow other related articles on the PHP Chinese website!