1. Basic knowledge
This chapter briefly introduces some of the internal mechanisms of the Zend engine. This knowledge is closely related to Extensions and can also help us write more efficient PHP code.
1.1 Storage of PHP variables
1.1.1 zval structure
Zend uses the zval structure to store the value of PHP variables. The structure is as follows:
Copy code The code is as follows:
typedef union _zvalue_value {
long lval; /* long value */
double dval; /* double value */
struct {
char *val;
int len;
} str;
HashTable *ht; /* hash table value */
zend_object_value obj;
} zvalue_value;
struct _zval_struct {
/* Variable information */
zvalue_value value; /* value */
zend_uint refcount;
zend_uchar type; /* active type */
zend_uchar is_ref;
};
typedef struct _zval_struct zval ;
Zend determines which member of value to access based on the type value. The available values are as follows:
IS_NULLN/A
IS_LONG corresponds to value.lval
IS_DOUBLE corresponds to value.dval
IS_STRING corresponds to value. str
IS_ARRAY corresponds to value.ht
IS_OBJECT corresponds to value.obj
IS_BOOL corresponds to value.lval.
IS_RESOURCE corresponds to value.lval
According to this table, two interesting things can be found: First, the PHP array is actually a HashTable, so It explains why PHP can support associative arrays; secondly, Resource is a long value, which usually stores a pointer, the index of an internal array, or other things that only the creator knows. It can be regarded as a handle.
1.1.1 Reference Counting
Reference counting is widely used in garbage collection, memory pools, strings, etc. Zend implements typical reference counting. Multiple PHP variables can share the same zval through the reference counting mechanism. The remaining two members of zval, is_ref and refcount, are used to support this sharing.
Obviously, refcount is used for counting. When the reference is increased or decreased, this value is also incremented and decremented accordingly. Once it decreases to zero, Zend will recycle the zval.
What about is_ref?
1.1.2 zval status
In PHP, there are two types of variables - reference and non-reference. They are all stored in Zend using reference counting. For non-reference variables, it is required that the variables are independent of each other. When modifying one variable, it cannot affect other variables. This conflict can be solved by using the Copy-On-Write mechanism - when trying to write a variable, Zend will find If the zval pointed to by this variable is shared by multiple variables, a zval with a refcount of 1 will be copied to it, and the refcount of the original zval will be decremented. This process is called "zval separation". However, for reference variables, the requirements are opposite to those for non-reference types. Variables assigned by reference must be bundled. Modifying one variable modifies all bundled variables.
It can be seen that it is necessary to point out the status of the current zval to deal with these two situations respectively. is_ref is for this purpose. It points out whether all the variables currently pointing to the zval are assigned by reference - either all references or none. . At this time, another variable is modified. Only when it is found that the is_ref of its zval is 0, that is, it is not a reference, Zend will execute Copy-On-Write.
1.1.3 zval state switching
When all assignment operations performed on a zval are references or non-references, one is_ref is enough to cope with it. However, the world is not always so beautiful. PHP cannot impose such restrictions on users. When we mix reference and non-reference assignments, special handling must be carried out.
Case I, look at the following PHP code:
The whole process is as follows:
The first three sentences of this code will point a, b and c to a zval, whose is_ref=1, refcount=3; the fourth sentence is a non-reference assignment , usually you only need to increase the reference count. However, the target zval is a reference variable, and it is obviously wrong to simply increase the reference count. Zend's solution is to generate a separate copy of zval for d.
The whole process is as follows:
1.1.1 Parameter passing
The passing of PHP function parameters is the same as variable assignment. Non-reference passing is equivalent to non-reference assignment, and reference passing is equivalent to reference assignment, and it may also happen. Causes a zval state switch to be performed. This will be mentioned later.
1.2 HashTable structure
HashTable is the most important and widely used data structure in Zend engine. It is used to store almost everything.
1.1.1 Data structure
The HashTable data structure is defined as follows:
Copy the code The code is as follows:
typedef struct bucket {
ulong h; // Store hash
uint nKeyLength;
void *pData; // Point to value, which is a copy of user data
void *pDataPtr;
struct bucket *pListNext; // Composed of pListNext and pListLast
struct bucket *pListLast; // Doubly linked list of the entire HashTable
struct bucket *pNext; // pNext and pLast are used to form a hash corresponding to
struct bucket *pLast; // Doubly linked list of
char arKey[1]; / / key
} Bucket;
typedef struct _hashtable {
uint nTableSize;
uint nTableMask;
uint nNumOfElements;
ulong nNextFreeElement;
Bucket *pInternalPointer; /* Used for element traversal */
Bucket *pListHead;
Bucket *pListTail;
Bucket **arBuckets; // hash array
dtor_func_t pDestructor; // Specified when HashTable is initialized, called when destroying Bucket
zend_bool persistent; // Whether to use C memory allocation routine
unsigned char nApplyCount;
zend_bool bApplyProtection;
# if ZEND_DEBUG
int inconsistent;
#endif
} HashTable;
In general, Zend's HashTable is a linked list hash, which is also optimized for linear traversal, as shown below:
HashTable contains two data structures, a linked list hash and a doubly linked list. The former is used for fast key-value query, and the latter is convenient for linear traversal and sorting. A Bucket exists in both data structures. middle.
A few explanations about this data structure:
l Why doubly linked lists are used in linked list hashing?
General linked list hashing only needs to operate by key, and only a singly linked list is enough. However, Zend sometimes needs to delete a given Bucket from the linked list hash, which can be achieved very efficiently using a double linked list.
l What does nTableMask do?
This value is used to convert the hash value to the arBuckets array index. When initializing a HashTable, Zend first allocates memory of nTableSize size for the arBuckets array. nTableSize is not less than the smallest 2^n of the user-specified size, which is 10* in binary. nTableMask = nTableSize – 1, which is binary 01*. At this time, h & nTableMask happens to fall in [0, nTableSize – 1], and Zend uses it as the index to access the arBuckets array.
l What does pDataPtr do?
Normally, when the user inserts a key-value pair, Zend will copy the value and point pData to the value copy. The copy operation requires calling Zend's internal routine emalloc to allocate memory. This is a very time-consuming operation and will consume a memory larger than the value (the extra memory is used to store cookies). If the value is small, it will cause Big waste. Considering that HashTable is mostly used to store pointer values, Zend introduces pDataPtr. When the value is as small as the pointer, Zend directly copies it to pDataPtr and points pData to pDataPtr. This avoids emalloc operations and also helps improve the Cache hit rate.
Why is the size of arKey only 1? Why not use pointers to manage keys?
ArKey is an array that stores keys, but its size is only 1, which is not enough to hold the key. The following code can be found in the initialization function of HashTable:
1p = (Bucket *) pemalloc(sizeof(Bucket) - 1 + nKeyLength, ht->persistent);
It can be seen that Zend allocates a piece of space for a Bucket that is enough for itself and The memory of key,
l The upper half is the Bucket, the lower half is the key, and arKey "happens" to be the last element of the Bucket, so you can use arKey to access the key. This technique is most common in memory management routines. When memory is allocated, memory larger than the specified size is actually allocated. The extra upper half is usually called a cookie, which stores information about this memory. , such as block size, previous block pointer, next block pointer, etc. Baidu's Transmit program uses this method.
The purpose of not using pointers to manage keys is to reduce one emalloc operation and to improve the Cache hit rate. Another necessary reason is that the key is fixed in most cases, and the entire Bucket will not be reallocated because the key becomes longer. This also explains why value is not allocated as an array as well - because value is mutable.
1.2.2 PHP Array
There is still an unanswered question about HashTable, that is, what does nNextFreeElement do?
Unlike general hashing, Zend’s HashTable allows users to directly specify the hash value, ignoring the key, or even not specifying the key ( At this time, nKeyLength is 0). At the same time, HashTable also supports the append operation. The user does not even need to specify the hash value, but only needs to provide the value. At this time, Zend uses nNextFreeElement as the hash, and then increments nNextFreeElement.
This behavior of HashTable looks strange, because it will not be able to access the value by key, and it is not a hash at all. The key to understanding the problem is that PHP arrays are implemented using HashTable - associative arrays use normal k-v mapping to add elements to HashTable, and their keys are strings specified by the user; non-associative arrays directly use the array subscript as the hash value, without There is a key; and when you mix associative and non-associative elements in an array, or when using the array_push operation, you need to use nNextFreeElement.
Looking at the value again, the value of the PHP array directly uses the general structure zval. pData points to zval*. According to the introduction in the previous section, this zval* will be directly stored in pDataPtr. Due to the direct use of zval, the elements of the array can be of any PHP type.
Array traversal operations, namely foreach, each, etc., are performed through the doubly linked list of HashTable, and pInternalPointer records the current position as a cursor.
1.2.3 Variable symbol table
In addition to arrays, HashTable is also used to store many other data, such as PHP functions, variable symbols, loaded modules, class members, etc.
A variable symbol table is equivalent to an associative array, its key is the variable name (it can be seen that using long variable names is not a good idea), and the value is zval*.
At any moment, the PHP code can see two variable symbol tables - symbol_table and active_symbol_table - the former is used to store global variables, called the global symbol table; the latter is a pointer, pointing to the currently active variable symbol table, usually It is the global symbol table. However, every time you enter a PHP function (here refers to the function created by the user using PHP code), Zend will create a variable symbol table local to the function and point active_symbol_table to the local symbol table. Zend always uses active_symbol_table to access variables, thus achieving scope control of local variables.
But if a variable marked as global is accessed locally in a function, Zend will perform special processing - create a reference to the variable with the same name in symbol_table in active_symbol_table. If there is no variable with the same name in symbol_table, it will be created first.
1.3 Memory and Files
The resources owned by a program generally include memory and files. For ordinary programs, these resources are process-oriented. When the process ends, the operating system or C library will automatically recycle those resources that we have not explicitly released.
However, PHP programs have their own particularities. It is based on pages. When a page is running, it will also apply for resources such as memory or files. However, when the page is finished running, the operating system or C library may not know that resources need to be allocated. Recycle. For example, we compile php into apache as a module and run apache in prefork or worker mode. In this case, the apache process or thread is reused, and the memory allocated by the php page will remain in the memory until the core is released.
In order to solve this problem, Zend provides a set of memory allocation APIs. Their functions are the same as the corresponding functions in C. The difference is that these functions allocate memory from Zend’s own memory pool, and they can implement automatic recycling based on pages. . In our module, the memory allocated for the page should use these APIs instead of C routines, otherwise Zend will try to efree our memory at the end of the page, and the result is usually a crush.
emalloc()
efree()
estrdup()
estrndup()
ecalloc()
erealloc()
In addition, Zend also provides a set of macros in the shape of VCWD_xxx to replace the C library and the corresponding file API of the operating system. , these macros support PHP's virtual working directory and should always be used in module code. For the specific definition of macro, please refer to the PHP source code "TSRM/tsrm_virtual_cwd.h". You may notice that the close operation is not provided in all those macros. This is because the object of close is an opened resource and does not involve the file path, so you can use C or operating system routines directly; similarly, read/ Operations such as write also directly use C or operating system routines.
The above introduces the introduction of mechanical design, manufacturing and automation majors, PHP kernel introduction and extended development guide - basic knowledge, including the introduction of mechanical design, manufacturing and automation majors. I hope it will be helpful to friends who are interested in PHP tutorials.