PHP is said to be simple, but it is not easy to master it. In addition to being able to use it, we also need to know its underlying working principle.
PHP is a dynamic language suitable for web development. To be more specific, it is a software framework that uses C language to implement a large number of components. Looking at it in a more narrow sense, it can be considered as a powerful UI framework.
What is the purpose of understanding the underlying implementation of PHP? To use dynamic language well, we must first understand it. Memory management and framework models are worthy of our reference. Through extended development, we can achieve more powerful functions and optimize the performance of our programs.
The core architecture of PHP is as shown below:
As can be seen from the picture, PHP is a 4-layer system from bottom to top:
If PHP is a car, then the frame of the car is PHP itself, Zend is the engine of the car, and the various components under Ext are the wheels of the car. Sapi can be regarded as a road, and the car can run on different roads. type of highway, and the execution of a PHP program means the car is running on the highway. So we need: a good engine, the right wheels, the right track.
As mentioned before, Sapi allows external applications to exchange data with PHP through a series of interfaces and implement specific processing methods according to different application characteristics. Some of our common sapis are:
Let’s first take a look at the process of executing PHP code.
As you can see from the picture, PHP implements a typical dynamic language execution process: after getting a piece of code, after going through stages such as lexical analysis and syntax analysis, the source program will be translated into instructions (opcodes). The ZEND virtual machine then executes these instructions in sequence to complete the operation. PHP itself is implemented in C, so the functions that are ultimately called are all C functions. In fact, we can regard PHP as a software developed in C.
The core of PHP execution is the translated instructions, which are opcodes.
Opcode is the most basic unit of PHP program execution. An opcode consists of two parameters (op1, op2), return value and processing function. The PHP program is ultimately translated into the sequential execution of a set of opcode processing functions.
Several common processing functions:
1 |
ZEND_ASSIGN_SPEC_CV_CV_HANDLER : 变量分配 ( $a = $b )
|
2 |
ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER:函数调用 |
3 |
ZEND_CONCAT_SPEC_CV_CV_HANDLER:字符串拼接 $a . $b
|
4 |
ZEND_ADD_SPEC_CV_CONST_HANDLER: 加法运算 $a 2
|
5 |
ZEND_IS_EQUAL_SPEC_CV_CONST:判断相等 $a ==1
|
6 |
ZEND_IS_IDENTICAL_SPEC_CV_CONST:判断相等 $a ===1
|
HashTable is the core data structure of zend. It is used to implement almost all common functions in PHP. The PHP array we know is its typical application. In addition, inside zend, such as function symbol table, global variables, etc. Implemented based on hash table.
PHP’s hash table has the following characteristics:
Zend hash table implements the typical hash table hash structure, and at the same time provides the function of forward and reverse traversal of the array by attaching a doubly linked list. Its structure is as follows:
As you can see, the hash table has both a hash structure in the form of key->value and a doubly linked list mode, making it very convenient to support fast search and linear traversal.
01 |
getKeyHashValue h; |
02 |
index = n & nTableMask; |
03 |
Bucket *p = arBucket[index]; |
04 |
while (p) {
|
05 |
if ((p->h == h) && (p->nKeyLength == nKeyLength)) {
|
06 |
RETURN p->data;
|
07 |
}
|
08 |
p=p->next;
|
09 |
} |
10 |
RETURN FALTURE; |
PHP is a weakly typed language and does not strictly distinguish the types of variables. PHP does not need to specify the type when declaring variables. PHP may perform implicit conversions of variable types during program execution. Like other strongly typed languages, explicit type conversion can also be performed in the program. PHP variables can be divided into simple types (int, string, bool), collection types (array resource object) and constants (const). All the above variables have the same structure zval under the hood.
Zval is another very important data structure in zend, used to identify and implement PHP variables. Its data structure is as follows:
Zval mainly consists of three parts:
Zvalue is used to save the actual data of a variable. Because multiple types need to be stored, zvalue is a union, thus implementing weak typing.
The corresponding relationship between PHP variable types and their actual storage is as follows:
1 |
IS_LONG -> lvalue
|
2 |
IS_DOUBLE -> dvalue
|
3 |
IS_ARRAY -> ht
|
4 |
IS_STRING -> str
|
5 |
IS_RESOURCE -> lvalue
|
Reference counting is widely used in memory recycling, string operations, etc. Variables in PHP are a typical application of reference counting. Zval's reference counting is implemented through the member variables is_ref and ref_count. Through reference counting, multiple variables can share the same data. Avoid the heavy consumption caused by frequent copying.
When performing an assignment operation, zend points the variable to the same zval and ref_count. When performing an unset operation, the corresponding ref_count-1. The destruction operation will only be performed when ref_count is reduced to 0. If it is a reference assignment, zend will modify is_ref to 1.
PHP variables realize variable sharing data through reference counting. What if you change the value of one of the variables? When trying to write a variable, if Zend finds that the zval pointed to by the variable is shared by multiple variables, it will copy a zval with a ref_count of 1 and decrement the refcount of the original zval. This process is called "zval separation". It can be seen that zend only performs copy operations when a write operation occurs, so it is also called copy-on-write (copy on write)
For reference variables, the requirements are opposite to those of non-reference types. Variables assigned by reference must be bundled. Modifying one variable modifies all bundled variables.
Integers and floating-point numbers are one of the basic types in PHP and are also simple variables. For integers and floating point numbers, the corresponding values are stored directly in zvalue. Their types are long and double respectively.
As can be seen from the zvalue structure, for integer types, unlike strongly typed languages such as C, PHP does not distinguish between int, unsigned int, long, long long and other types. For it, there is only one type of integer That is, long. From this, it can be seen that in PHP, the value range of integers is determined by the number of compiler bits and is not fixed.
For floating point numbers, similar to integers, it does not distinguish between float and double but only double.
In PHP, what should I do if the integer range is out of bounds? In this case, it will be automatically converted to double type. You must be careful about this, as many tricks are caused by this.
Like integers, character variables are also basic types and simple variables in PHP. As can be seen from the zvalue structure, in PHP, a string is composed of a pointer to the actual data and a length structure, which is similar to the string in C. Since the length is represented by an actual variable, unlike c, its string can be binary data (including
When adding, modifying, or appending string operations, PHP will reallocate memory to generate new strings. Finally, for security reasons, PHP will still addCommon string splicing methods and speed comparison:
Suppose there are the following 4 variables: $strA=‘123’; $strB = ‘456’; $intA=123; intB=456;
Now we will compare and explain the following string splicing methods:
1 |
$res = $strA . $strB 和 $res = “ $strA $strB ”
|
2 |
这种情况下,zend会重新malloc一块内存并进行相应处理,其速度一般 |
3 |
$strA = $strA . $strB
|
4 |
这种是速度最快的,zend会在当前strA基础上直接relloc,避免重复拷贝 |
5 |
$res = $intA . $intB
|
8 |
这会是最慢的一种方式,因为sprintf在PHP中并不是一个语言结构,本身对于格式识别和处理就需要耗费比较多时间,另外本身机制也是malloc。不过sprintf的方式最具可读性,实际中可以根据具体情况灵活选择。 |
PHP arrays are naturally implemented through Zend HashTable.
How to implement foreach operation? Foreach on an array is completed by traversing the doubly linked list in the hashtable. For index arrays, traversal through foreach is much more efficient than for, eliminating the need to search for key->value. The count operation directly calls HashTable->NumOfElements, O(1) operation. For a string like '123', zend will convert it to its integer form. $arr[‘123’] and $arr[123] are equivalent
Resource type variable is the most complex variable in PHP and is also a composite structure.
PHP’s zval can represent a wide range of data types, but it is difficult to fully describe custom data types. Since there is no efficient way to represent these composite structures, there is no way to use traditional operators on them. To solve this problem, you only need to refer to the pointer through an essentially arbitrary identifier (label), which is called a resource.
In zval, for resource, lval is used as a pointer, directly pointing to the address of the resource. Resource can be any composite structure. The familiar mysqli, fsock, memcached, etc. are all resources.
How to use resources:
Resources can persist long-term, not just after all variables referencing it go out of scope, but even after a request ends and a new request is made. These resources are called persistent resources because they persist throughout the life cycle of the SAPI unless specifically destroyed. In many cases, persistent resources can improve performance to a certain extent. For example, in our common mysql_pconnect, persistent resources allocate memory through pemalloc so that they will not be released when the request ends. For zend, there is no distinction between the two per se.
How are local variables and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time, with the former used to maintain global variables. The latter is a pointer pointing to the currently active variable symbol table. When the program enters a function, zend will allocate a symbol table x to it and point active_symbol_table to a. In this way, the distinction between global and local variables is achieved.
Get variable values: PHP's symbol table is implemented through hash_table. Each variable is assigned a unique identifier. When obtaining, the corresponding zval is found from the table according to the identifier and returned.
Using global variables in functions: In functions, we can use global variables by explicitly declaring global. Create a reference to the variable with the same name in symbol_table in active_symbol_table. If there is no variable with the same name in symbol_table, it will be created first.