What is the Python GIL, how it works, and how it affects gunicorn.
Which Gunicorn worker type should I choose for production environment?
Python has a global lock (GIL) which only allows one thread to run (i.e. interpret bytecode). In my opinion, understanding how Python handles concurrency is essential if you want to optimize your Python services.
Python and gunicorn give you different ways to handle concurrency, and since there is no magic bullet that covers all use cases, it's a good idea to understand the options, tradeoffs, and advantages of each option.
Gunicorn exposes these different options under the concept of "workers types". Each type is suitable for a specific set of use cases.
This is the simplest type of job where the only concurrency option is to fork N processes that will serve requests in parallel.
They can work well, but they incur a lot of overhead (such as memory and CPU context switching), and if most of your request time is waiting for I/O, scaling Sex is bad.
gthread worker improves on this by allowing you to create N threads per process. This improves I/O performance because you can run more instances of your code simultaneously. This is the only one of the four affected by GIL.
eventlet/gevent workers attempt to further improve the gthread model by running lightweight user threads (aka green threads, greenlets, etc.).
#This allows you to have thousands of said greenlets at very little cost compared to system threads. Another difference is that it follows a collaborative work model rather than a preemptive one, allowing uninterrupted work until they block. We will first analyze the behavior of the gthread worker thread when processing requests and how it is affected by the GIL.
Unlike sync where each request is served directly by one process, with gthread, each process has N threads to scale better without spawning multiple processes s expenses. Since you are running multiple threads in the same process, the GIL will prevent them from running in parallel.
GIL is not a process or special thread. It is just a boolean variable whose access is protected by a mutex, which ensures that only one thread is running within each process. How it works can be seen in the picture above. In this example we can see that we have 2 system threads running concurrently, each thread handling 1 request. The process is like this:
Another option to increase concurrency without using processes is to use greenlets. This worker spawns "user threads" instead of "system threads" to increase concurrency.
While this means that they are not affected by the GIL, it also means that you still cannot increase the parallelism because they cannot be scheduled by the CPU in parallel.
For this case, it is obvious that having a greenlet type worker is not ideal. We end up having the second request wait until the first request completes and then idle waiting for I/O again.
In these scenarios, the greenlet collaboration model really shines because you don't waste time on context switches and avoid the overhead of running multiple system threads.
We will witness this in the benchmark test at the end of this article. Now, this begs the following question:
To answer these questions, you need to monitor to collect the necessary metrics and then run tailored benchmarks against those same metrics. There is no use running synthetic benchmarks that have zero correlation to your actual usage patterns. The graph below shows latency and throughput metrics for different scenarios to give you an idea of how it all works together.
Here we can see how changing the GIL thread switching interval/timeout affects request latency. As expected, IO latency gets better as the switching interval decreases. This happens because CPU-bound threads are forced to release the GIL more frequently and allow other threads to complete their work.
But this is not a panacea. Reducing the switch interval will make CPU-bound threads take longer to complete. We can also see an increase in overall latency and a decrease in timeouts due to the increased overhead of constant thread switching. If you want to try it yourself, you can change the switching interval using the following code:
Overall, we can see that the benchmarks reflect our intuition from our previous analysis of how GIL-bound threads and greenlets work.
Gthread has better average latency for IO-bound requests because the switching interval forces long-running threads to release.
gevent CPU-bound requests have better latency than gthreads because they are not interrupted to service other requests.
The results here also reflect our previous comparison of gevent vs. gthread Intuition for better throughput. These benchmarks are highly dependent on the type of work being done and may not necessarily translate directly to your use case.
The main goal of these benchmarks is to give you some guidance on what to test and measure in order to maximize each CPU core that will serve requests.
Since all gunicorn workers allow you to specify the number of processes that will run, what changes is how each process handles concurrent connections. Therefore, make sure to use the same number of workers to make the test fair. Let's now try to answer the previous question using the data collected from our benchmark.
Indeed. However, for the vast majority of workloads, it's not a game changer.
How to choose between gevent/eventlet and gthread when you are mixing I/O and CPU work? As we can see, ghtread tends to allow better concurrency when you have more CPU-intensive work.
As long as your benchmarks can simulate production-like behavior, you will clearly see peak performance and then it will start to degrade due to too many threads.
Should I just use sync workers and increase the number of forked processes to avoid the GIL?
Unless your I/O is almost zero, scaling with just processes is not the best option.
Coroutines/Greenlets can improve CPU efficiency because they avoid interrupts and context switches between threads. Coroutines trade latency for throughput.
Coroutines can cause more unpredictable latencies if you mix IO and CPU-bound endpoints - CPU-bound endpoints are not interrupted to service other incoming requests. If you take the time to configure gunicorn correctly, the GIL is not a problem.
The above is the detailed content of Understand Gunicorn and Python GIL in one article. For more information, please follow other related articles on the PHP Chinese website!