one. Preface
HashTable is the soul of PHP, because HashTable is widely used in the Zend engine, such as variable table, constant table, function table, etc., which are adapted to HashTable storage. In addition , PHP's array is also implemented by using HashTble, so understanding PHP's HashTable can truly understand PHP.
For the convenience of reading, here are the basic concepts that appear in the implementation of HashTable . A hash table is a data structure that maps specific keys to specific values through a hash function. It maintains a one-to-one correspondence between keys and values.
Key: An indicator used to manipulate data, such as an index in a PHP array, or a string key, etc.
Slot (slot/bucket): A unit in the hash table used to save data, which is the container where the data is actually stored.
Hash function: A function that maps the key to the location of the slot where the data should be stored.
Hash collision: A situation where a hash function maps two different keys to the same index.
The hash table in PHP is implemented in Zend/zend_hash.h. Let’s first look at the data structure in the PHP implementation. PHP uses the following two data structures to implement the hash table. The HashTable structure is used to save the basic information required by the entire hash table, while the Bucket structure is used to save the specific data content, (see the end for the specific source code)
two. Example
So, taking creating a variable as an example, what exactly happens under the hood?
Steps to create variables: $str = "hello";
1: Create a zval structure and set its type IS_STRING
2: Set its value to hello
3: Add it to the symbol table
{ zval *fooval; MAKE_STD_ZVAL(fooval); ZVAL_STRING(fooval, "hello", 1); ZEND_SET_SYMBOL( EG(active_symbol_table) , "foo" , fooval); }
The first two steps are mentioned in the variable structure of the previous article,Details SeeThe storage mechanism of the PHP kernel (separation/change)
What is the symbol table?
Answer: The symbol table is a hash table, which stores The variable name ->The address of the zval structure of the variable
// zend/zend_globals.h 161 lines of symbol table
struct _zend_executor_globals { ... ... HashTable *active_symbol_table; /*活动符号表*/ HashTable symbol_table; /* 全局符号表 */ HashTable included_files; /* files already included */ ... }
When a function is executed, the "execution environment structure" of the function will be generated, including the function name, parameters, execution steps, class (if it is a method), and a symbol table generated for this function. The symbol table is stored uniformly On the stack. And point active_symbol_table to the newly generated symbol table
struct _zend_execute_data { struct _zend_op *opline; zend_function_state function_state; zend_op_array *op_array;//函数编译后的执行逻辑,编译后的opcode二进制代码,称为op_array zval *object; HashTable *symbol_table;//此函数的符号表地址 struct _zend_execute_data *prev_execute_data; zval *old_error_reporting; zend_bool nested; zval **original_return_value; zend_class_entry *current_scope; zend_class_entry *current_called_scope; zval *current_this; struct _zend_op *fast_ret; /* used by FAST_CALL/FAST_RET (finally keyword) */ call_slot *call_slots; call_slot *call; };
Through the following example, we will describe PHP's allocation of various storage spaces during the execution of the function, and explain why PHP's static variables can be shared.
When a function is executed, the "execution environment structure" of the function will be generated, including the function name, parameters, execution steps, and the class (if it is a method), And generate a symbol table for this function. The symbol table is placed on the stack. And the active_symbol_table points to the newly generated symbol table
解释:
1.执行t1时,形成t1的环境结构体,t1调入到执行栈,t1也有自己的符号表,符号表里边存储的变量对应这个t1环境(局部变量嘛)
2.执行t1到第三行,执行了t2,形成t2的环境结构体,t2入栈,t2也有自己的变量自己的符号表,与t1互不影响。
3.假使t1函数内部出现了递归调用t1,此时会生成第二个t1环境结构体,和【1】中是两个结构体,互不影响
函数执行时的栈变化
当函数调用时,为此函数生成了一个”执行环境变量”的结构体,里面存储了当前函数的名称,参数,对应的类....等等信息.称为_zend_execute_data {}结构体
struct _zend_execute_data { struct _zend_op *opline; zend_function_state function_state; zend_op_array *op_array;//函数编译后的执行逻辑,编译后的opcode二进制代码,称为op_array zval *object; HashTable *symbol_table;//此函数的符号表地址 struct _zend_execute_data *prev_execute_data; zval *old_error_reporting; zend_bool nested; zval **original_return_value; zend_class_entry *current_scope; zend_class_entry *current_called_scope; zval *current_this; struct _zend_op *fast_ret; /* used by FAST_CALL/FAST_RET (finally keyword) */ call_slot *call_slots; call_slot *call; };
这个结构体中,有2个重要的信息需要注意!:
{
*op_array ------>是函数的执行步骤,公用(静态变量字段存储于此!所以改一次依赖于此逻辑的函数全修改!)
*hash_table---->symbol_table 这个函数对应的符号表
}
思考一下: 1个函数,递归调用自己3次, 如t1
问:在栈上,肯定要有3个 execute_data生成.但是,这3个execute_data--->对应几个*op_array;
答:函数编译完了,生成一份*op_array,因为函数的执行逻辑是固定的.
问:生成了几个 symbol_table?
答:生成3个符号表.
结论:
1.每一个函数调用是都会生成自己的环境栈和符号表栈,不同的环境栈对应了自己的符号表栈,所以每个函数中的变量常量等,他们是有对应函数内的作用域限制
2.虽然每次会生成不同的环境栈与作用域,但是如果调用的是同一个函数,其 *op_array;是公用1份的,换句话说,t1递归调用自己,每次都会开辟一个环境栈区分独立,但是他们是同一个函数逻辑,所以op_array是一样的,而
三。其他
通过一个哈希算法,它总有碰撞的时候吧。PHP中的哈希表是使用拉链法来解决冲突 (具体点讲就是使用链表来存储哈希到同一个槽位的数据,Zend为了保存数据之间的关系使用了双向链表来链接元素)。
对于HashTable的初始化_zend_hash_init,
插入_zend_hash_add_or_update,
元素访问_zend_hash_add_or_find等操作,源码中有就不再这里叙述。
这样回头一想,变量表,常量表,函数表等,他们在PHP中都是靠HashTable来实现的,如[二]中叙述,hashtable是不是很强大呢?
Zend引擎哈希表结构和关系:
Zend/zend_hash.h 55行
typedef struct bucket { ulong h; /* Used for numeric indexing */ uint nKeyLength; void *pData; void *pDataPtr; struct bucket *pListNext; struct bucket *pListLast; struct bucket *pNext; struct bucket *pLast; const char *arKey; } Bucket; typedef struct _hashtable { uint nTableSize; // hash Bucket的大小,最小为8,以2x增长。 uint nTableMask; // nTableSize-1 , 索引取值的优化 uint nNumOfElements; // hash Bucket中当前存在的元素个数,count()函数会直接返回此值 ulong nNextFreeElement; // 下一个数字索引的位置 Bucket *pInternalPointer; // 当前遍历的指针(foreach比for快的原因之一) Bucket *pListHead; // 存储数组头元素指针 Bucket *pListTail; // 存储数组尾元素指针 Bucket **arBuckets; // 存储hash数组 dtor_func_t pDestructor; // 在删除元素时执行的回调函数,用于资源的释放 zend_bool persistent; //指出了Bucket内存分配的方式。如果persisient为TRUE,则使用操作系统本身的内存分配函数为Bucket分配内存,否则使用PHP的内存分配函数。 unsigned char nApplyCount; // 标记当前hash Bucket被递归访问的次数(防止多次递归) zend_bool bApplyProtection;// 标记当前hash桶允许不允许多次访问,不允许时,最多只能递归3次 #if ZEND_DEBUG int inconsistent; #endif } HashTable;
Zend/zend_compiles.h 261行,op_array结构代码
struct _zend_op_array { /* Common elements */ zend_uchar type; const char *function_name; zend_class_entry *scope; zend_uint fn_flags; union _zend_function *prototype; zend_uint num_args; zend_uint required_num_args; zend_arg_info *arg_info; /* END of common elements */ zend_uint *refcount; zend_op *opcodes; zend_uint last; zend_compiled_variable *vars; int last_var; zend_uint T; zend_uint nested_calls; zend_uint used_stack; zend_brk_cont_element *brk_cont_array; int last_brk_cont; zend_try_catch_element *try_catch_array; int last_try_catch; zend_bool has_finally_block; /* static variables support */ HashTable *static_variables; zend_uint this_var; const char *filename; zend_uint line_start; zend_uint line_end; const char *doc_comment; zend_uint doc_comment_len; zend_uint early_binding; /* the linked list of delayed declarations */ zend_literal *literals; int last_literal; void **run_time_cache; int last_cache_slot; void *reserved[ZEND_MAX_RESERVED_RESOURCES]; };
The above is the detailed content of PHP core - a brief explanation of PHP soul HashTble. For more information, please follow other related articles on the PHP Chinese website!