What is the GIL in Python
Why GIL is needed
GIL is essentially a lock. Students who have studied operating systems know that locks are introduced to avoid data inconsistencies caused by concurrent access. There are many global variables defined outside functions in CPython, such as usable_arenas and usedpools in memory management. If multiple threads apply for memory at the same time, these variables may be modified at the same time, causing data confusion. In addition, Python's garbage collection mechanism is based on reference counting. All objects have an ob_refcnt field that indicates how many variables currently reference the current object. Operations such as variable assignment and parameter passing will increase the reference count. Exiting the scope or returning from the function will reduce the references. count. Similarly, if multiple threads modify the reference count of the same object at the same time, it is possible that ob_refcnt is different from the real value, which may cause a memory leak. Objects that will not be used will not be recycled. In more serious cases, the object may not be recycled. The referenced object caused the Python interpreter to crash.
Implementation of GIL
The definition of GIL in CPython is as follows
struct _gil_runtime_state { unsigned long interval; // 请求 GIL 的线程在 interval 毫秒后还没成功,就会向持有 GIL 的线程发出释放信号 _Py_atomic_address last_holder; // GIL 上一次的持有线程,强制切换线程时会用到 _Py_atomic_int locked; // GIL 是否被某个线程持有 unsigned long switch_number; // GIL 的持有线程切换了多少次 // 条件变量和互斥锁,一般都是成对出现 PyCOND_T cond; PyMUTEX_T mutex; // 条件变量,用于强制切换线程 PyCOND_T switch_cond; PyMUTEX_T switch_mutex; };
The most essential thing is the locked field protected by mutex, which indicates whether the GIL is currently held. The other fields are for Used to optimize the GIL. When a thread applies for GIL, it calls the take_gil() method, and when it releases GIL, it calls the drop_gil() method. In order to avoid starvation, when a thread waits for interval milliseconds (default is 5 milliseconds) and has not applied for GIL, it will actively send a signal to the thread holding GIL, and the GIL holder will check the signal at the appropriate time. , if it is found that other threads are applying, the GIL will be forcibly released. The appropriate timing mentioned here is different in different versions. In the early days, it was checked every 100 instructions. In Python 3.10.4, it was checked at the end of the conditional statement, the end of each loop body of the loop statement, and the end of the function call. It will be checked when the time comes.
The function take_gil() that applies for GIL is simplified as follows
static void take_gil(PyThreadState *tstate) { ... // 申请互斥锁 MUTEX_LOCK(gil->mutex); // 如果 GIL 空闲就直接获取 if (!_Py_atomic_load_relaxed(&gil->locked)) { goto _ready; } // 尝试等待 while (_Py_atomic_load_relaxed(&gil->locked)) { unsigned long saved_switchnum = gil->switch_number; unsigned long interval = (gil->interval >= 1 ? gil->interval : 1); int timed_out = 0; COND_TIMED_WAIT(gil->cond, gil->mutex, interval, timed_out); if (timed_out && _Py_atomic_load_relaxed(&gil->locked) && gil->switch_number == saved_switchnum) { SET_GIL_DROP_REQUEST(interp); } } _ready: MUTEX_LOCK(gil->switch_mutex); _Py_atomic_store_relaxed(&gil->locked, 1); _Py_ANNOTATE_RWLOCK_ACQUIRED(&gil->locked, /*is_write=*/1); if (tstate != (PyThreadState*)_Py_atomic_load_relaxed(&gil->last_holder)) { _Py_atomic_store_relaxed(&gil->last_holder, (uintptr_t)tstate); ++gil->switch_number; } // 唤醒强制切换的线程主动等待的条件变量 COND_SIGNAL(gil->switch_cond); MUTEX_UNLOCK(gil->switch_mutex); if (_Py_atomic_load_relaxed(&ceval2->gil_drop_request)) { RESET_GIL_DROP_REQUEST(interp); } else { COMPUTE_EVAL_BREAKER(interp, ceval, ceval2); } ... // 释放互斥锁 MUTEX_UNLOCK(gil->mutex); }
In order to ensure atomicity, the entire function body needs to apply for and release the mutex lock gil->mutex at the beginning and end respectively. If the current GIL is idle, get the GIL directly. If it is not idle, wait for the condition variable gil->cond interval milliseconds (not less than 1 millisecond). If it times out and no GIL switching occurs during the period, set gil_drop_request to request forced switching. The GIL holds the thread, otherwise it continues to wait. Once the GIL is successfully obtained, the values of gil->locked, gil->last_holder and gil->switch_number need to be updated, the condition variable gil->switch_cond must be awakened, and the mutex lock gil->mutex must be released.
The function drop_gil() that releases GIL is simplified as follows
static void drop_gil(struct _ceval_runtime_state *ceval, struct _ceval_state *ceval2, PyThreadState *tstate) { ... if (tstate != NULL) { _Py_atomic_store_relaxed(&gil->last_holder, (uintptr_t)tstate); } MUTEX_LOCK(gil->mutex); _Py_ANNOTATE_RWLOCK_RELEASED(&gil->locked, /*is_write=*/1); // 释放 GIL _Py_atomic_store_relaxed(&gil->locked, 0); // 唤醒正在等待 GIL 的线程 COND_SIGNAL(gil->cond); MUTEX_UNLOCK(gil->mutex); if (_Py_atomic_load_relaxed(&ceval2->gil_drop_request) && tstate != NULL) { MUTEX_LOCK(gil->switch_mutex); // 强制等待一次线程切换才被唤醒,避免饥饿 if (((PyThreadState*)_Py_atomic_load_relaxed(&gil->last_holder)) == tstate) { assert(is_tstate_valid(tstate)); RESET_GIL_DROP_REQUEST(tstate->interp); COND_WAIT(gil->switch_cond, gil->switch_mutex); } MUTEX_UNLOCK(gil->switch_mutex); } }
First release the GIL under the protection of gil->mutex, and then wake up other threads that are waiting for the GIL. In a multi-CPU environment, the current thread has a higher probability of reacquiring the GIL after releasing the GIL. In order to avoid starving other threads, the current thread needs to be forced to wait for the condition variable gil->switch_cond. It can only obtain the GIL when other threads Only then will the current thread be awakened.
Some notes
GIL optimization
Code subject to GIL constraints cannot be executed in parallel, which reduces the overall performance. In order to minimize the performance loss, Python does not perform IO operations or not When intensive CPU calculations involving object access occur, the GIL will be actively released, reducing the granularity of the GIL, such as
reading and writing files
Network access
Encrypted data/Compressed data
So strictly speaking, in the case of a single process, multiple Python threads may be accessed simultaneously Execution, for example, one thread is running normally and another thread is compressing data.
The consistency of user data cannot rely on GIL
GIL is a lock generated to maintain the consistency of internal variables of the Python interpreter. The consistency of user data is not responsible for GIL. Although GIL also ensures the consistency of user data to a certain extent. For example, instructions that do not involve jumps and function calls in Python 3.10.4 will be executed atomically under the constraints of GIL, but the consistency of data in business logic The user needs to lock it himself to ensure it.
The following code uses two threads to simulate the user's collection of fragments and winning awards
from threading import Thread def main(): stat = {"piece_count": 0, "reward_count": 0} t1 = Thread(target=process_piece, args=(stat,)) t2 = Thread(target=process_piece, args=(stat,)) t1.start() t2.start() t1.join() t2.join() print(stat) def process_piece(stat): for i in range(10000000): if stat["piece_count"] % 10 == 0: reward = True else: reward = False if reward: stat["reward_count"] += 1 stat["piece_count"] += 1 if __name__ == "__main__": main()
Assuming that the user can get a reward every time he collects 10 fragments, and each thread has collected 10,000,000 fragments, it should be 9999999 rewards were obtained (the last time was not calculated), a total of 20000000 fragments should be collected, and 1999998 rewards were obtained, but the results of the first run on my computer were as follows
{'piece_count': 20000000, 'reward_count': 1999987}
The total number of fragments is consistent with expectations, but the number of rewards But there are 12 missing. The number of pieces is correct because in Python 3.10.4, stat["piece_count"] = 1 is executed atomically under GIL constraints. Since the execution thread may be switched at the end of each loop, it is possible that thread t1 will increase piece_count to 100 at the end of a certain loop, but before the next loop starts to judge modulo 10, the Python interpreter switches to thread t2 for execution, and t2 will increase piece_count. If you reach 101, you will miss a reward.
Attachment: How to avoid being affected by GIL
Having said so much, if I don’t talk about the solution, it is just a popular science post, but it is useless. GIL is so bad, is there a way around it? Let’s take a look at what solutions are available.
Use multiprocess to replace Thread
The emergence of the multiprocess library is largely to make up for the inefficiency of the thread library due to GIL. It completely replicates a set of interfaces provided by thread to facilitate migration. The only difference is that it uses multiple processes instead of multiple threads. Each process has its own independent GIL, so there will be no GIL contention between processes.
Of course multiprocess is not a panacea. Its introduction will increase the difficulty of data communication and synchronization between time threads in the program. Take the counter as an example. If we want multiple threads to accumulate the same variable, for thread, declare a global variable and wrap three lines with the thread.Lock context. In multiprocess, since the processes cannot see each other's data, they can only declare a Queue in the main thread, put and then get, or use shared memory. This additional implementation cost makes coding multi-threaded programs, which is already very painful, even more painful. Where are the specific difficulties? Interested readers can further read this article
Use other parsers
As mentioned before, since GIL is only a product of CPython, are other parsers better? ? Yes, parsers like JPython and IronPython do not require the help of the GIL due to the nature of their implementation languages. However, by using Java/C# for the parser implementation, they also lost the opportunity to take advantage of the community's many useful features of the C language module. So these parsers have always been relatively niche. After all, everyone will choose the former over function and performance in the early stage. Done is better than perfect.
So it’s hopeless?
Of course, the Python community is also working very hard to continuously improve the GIL, and even try to remove the GIL. And there have been a lot of improvements in each minor version. Interested readers can further read this Slide
Another improvement Reworking the GIL
– Change the switching granularity from based on opcode counting to based on time slice counting
&ndash ; Prevent the thread that recently released the GIL lock from being scheduled again immediately
– Added thread priority function (high-priority threads can force other threads to release the GIL lock they hold)
The above is the detailed content of What is the GIL in Python. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

Enable PyTorch GPU acceleration on CentOS system requires the installation of CUDA, cuDNN and GPU versions of PyTorch. The following steps will guide you through the process: CUDA and cuDNN installation determine CUDA version compatibility: Use the nvidia-smi command to view the CUDA version supported by your NVIDIA graphics card. For example, your MX450 graphics card may support CUDA11.1 or higher. Download and install CUDAToolkit: Visit the official website of NVIDIACUDAToolkit and download and install the corresponding version according to the highest CUDA version supported by your graphics card. Install cuDNN library:

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

MinIO Object Storage: High-performance deployment under CentOS system MinIO is a high-performance, distributed object storage system developed based on the Go language, compatible with AmazonS3. It supports a variety of client languages, including Java, Python, JavaScript, and Go. This article will briefly introduce the installation and compatibility of MinIO on CentOS systems. CentOS version compatibility MinIO has been verified on multiple CentOS versions, including but not limited to: CentOS7.9: Provides a complete installation guide covering cluster configuration, environment preparation, configuration file settings, disk partitioning, and MinI

PyTorch distributed training on CentOS system requires the following steps: PyTorch installation: The premise is that Python and pip are installed in CentOS system. Depending on your CUDA version, get the appropriate installation command from the PyTorch official website. For CPU-only training, you can use the following command: pipinstalltorchtorchvisiontorchaudio If you need GPU support, make sure that the corresponding version of CUDA and cuDNN are installed and use the corresponding PyTorch version for installation. Distributed environment configuration: Distributed training usually requires multiple machines or single-machine multiple GPUs. Place

When installing PyTorch on CentOS system, you need to carefully select the appropriate version and consider the following key factors: 1. System environment compatibility: Operating system: It is recommended to use CentOS7 or higher. CUDA and cuDNN:PyTorch version and CUDA version are closely related. For example, PyTorch1.9.0 requires CUDA11.1, while PyTorch2.0.1 requires CUDA11.3. The cuDNN version must also match the CUDA version. Before selecting the PyTorch version, be sure to confirm that compatible CUDA and cuDNN versions have been installed. Python version: PyTorch official branch

CentOS Installing Nginx requires following the following steps: Installing dependencies such as development tools, pcre-devel, and openssl-devel. Download the Nginx source code package, unzip it and compile and install it, and specify the installation path as /usr/local/nginx. Create Nginx users and user groups and set permissions. Modify the configuration file nginx.conf, and configure the listening port and domain name/IP address. Start the Nginx service. Common errors need to be paid attention to, such as dependency issues, port conflicts, and configuration file errors. Performance optimization needs to be adjusted according to the specific situation, such as turning on cache and adjusting the number of worker processes.
