In the process of reading PHP source code and learning PHP extension development, I came into contact with a large number of macros containing the word "TSRM". After consulting the information, I know that these macros are related to Zend's thread safety mechanism. Most of the information recommends using these macros in accordance with the established rules, without explaining the specific functions of these macros. It’s always uncomfortable not knowing what’s going on, so I briefly understood the relevant mechanisms by reading the source code and consulting limited information. This article is a summary of my research content.
This article first explains the concept of thread safety and the background of thread safety in PHP, and then conducts a detailed study of PHP's thread safety mechanism ZTS (Zend Thread Safety) and the specific implementation of TSRM. The research content includes related data structures, implementation details and Running mechanism, and finally studied Zend's selective compilation issues for single-threaded and multi-threaded environments.
Thread safe
Thread safety issue, in a nutshell, is how to safely access public resources in a multi-threaded environment. We know that each thread only has a private stack and shares the heap of the process to which it belongs. In C, when a variable is declared outside any function, it becomes a global variable. At this time, the variable will be allocated to the shared storage space of the process. Different threads refer to the same address space, so if a thread modifies If this variable is set, it will affect all threads. This seems to provide convenience for threads to share data, but PHP often processes one request per thread, so it is hoped that each thread will have a copy of the global variable, and does not want requests to interfere with each other.
Early PHP was often used in a single-threaded environment. Each process only started one thread, so there was no thread safety issue. Later, the use of PHP in a multi-threaded environment emerged, so Zend introduced the Zend Thread Safety (ZTS) mechanism to ensure thread safety.
Basic principles and implementation of ZTSSpeaking of which, the basic idea of ZTS is very intuitive. Doesn’t it mean that each global variable needs to have a copy in each thread? Then I will provide this mechanism:
In a multi-threaded environment, applying for global variables is no longer a simple matter of declaring a variable. Instead, the entire process allocates a memory space on the heap as a "thread global variable pool". This memory pool is initialized when the process starts. When a thread needs to apply for a global variable, call TSRM (Thread Safe Resource Manager, the specific implementation of ZTS) through the corresponding method and pass the necessary parameters (such as variable size, etc.). TSRM is responsible for allocating the corresponding memory block in the memory pool and Return the reference ID of this memory, so that the next time this thread needs to read or write this variable, it can pass the unique reference ID to TSRM, and TSRM will be responsible for the actual read and write operations. This achieves thread-safe global variables. The following figure gives a schematic diagram of the ZTS principle:
Thread1 and Thread2 belong to the same process, each of which needs a global variable Global Var. TSRM allocates an area for each of them in the thread global memory pool (yellow part), and identifies them by a unique ID, so that the two Threads can access their own variables through TSRM without interfering with each other.
Let’s take a look at how Zend implements this mechanism through specific code snippets. Here I am using the source code of PHP5.3.8.
The implementation code of TSRM is in the "TSRM" directory of the PHP source code.
There are two important data structures in TSRM: tsrm_tls_entry and tsrm_resource_type. Let’s look at tsrm_tls_entry first.
tsrm_tls_entry is defined in TSRM/TSRM.c:
typedef struct _tsrm_tls_entry tsrm_tls_entry; struct _tsrm_tls_entry { void **storage; int count; THREAD_T thread_id; tsrm_tls_entry *next; }
Each tsrm_tls_entry structure is responsible for representing all global variable resources of a thread, where thread_id stores the thread ID, count records the number of global variables, and next points to the next node. Storage can be viewed as an array of pointers, where each element is a global variable pointing to the thread represented by this node. Finally, the tsrm_tls_entry of each thread is formed into a linked list structure, and the linked list head pointer is assigned to a global static variable tsrm_tls_table. Note that because tsrm_tls_table is a real global variable, all threads will share this variable, which achieves memory management consistency between threads. The schematic diagram of the tsrm_tls_entry and tsrm_tls_table structures is as follows:
The internal structure of tsrm_resource_type is relatively simple:
typedef struct { size_t size; ts_allocate_ctor ctor; ts_allocate_dtor dtor; int done; } tsrm_resource_type;
上文说过tsrm_tls_entry是以线程为单位的(每个线程一个节点),而tsrm_resource_type以资源(或者说全局变量)为单位,每次一个新的资源被分配时,就会创建一个tsrm_resource_type。所有tsrm_resource_type以数组(线性表)的方式组成tsrm_resource_table,其下标就是这个资源的ID。每个tsrm_resource_type存储了此资源的大小和构造、析构方法指针。某种程度上,tsrm_resource_table可以看做是一个哈希表,key是资源ID,value是tsrm_resource_type结构。
精彩内容,请点击下一页!