This article brings you an introduction to the basic structure of the group. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
The following is the basic structure, insertion, search and rehash process of PHP array.
struct _zend_array { zend_refcounted_h gc; union { struct { ZEND_ENDIAN_LOHI_4( zend_uchar flags, zend_uchar nApplyCount, zend_uchar nIteratorsCount, zend_uchar consistency) } v; uint32_t flags; } u; uint32_t nTableMask; // 哈希值计算掩码,等于nTableSize的负值(nTableMask = -nTableSize) Bucket *arData; // 存储元素数组,指向第一个Bucket uint32_t nNumUsed; // 已用Bucket数 uint32_t nNumOfElements; // 哈希表有效元素数 = nNumUsed - num(is_undef) uint32_t nTableSize; // 哈希表总大小,为2的n次方, 最小为8 uint32_t nInternalPointer; // 怀疑是内部指针 zend_long nNextFreeElement; // 下一个可用的数值索引 arr[] = 1;arr["a"] = 2;arr[] = 3; 则nNextFreeElement = 2; dtor_func_t pDestructor; }; typedef struct _Bucket { zval val; // 存储的具体value zend_ulong h; // hash value (or numeric index) zend_string *key; // string key or NULL for numerics } Bucket;
Description:
When the array is stored, the value is first saved in order, and then the value position is saved.
The array that stores records is called a hash table. This array is used to store values, and values are stored in order. Their storage locations will be stored in the idx obtained by calculating the hash of the key modulo nTableMask.
When the array is initialized, the minimum size is 8, which is 16, 32, 64. . .
The idx area created during array initialization will all be initialized to -1, and will also be initialized to -1 during rehash.
When an element is deleted from the array, the type of the deleted element is marked as is_undef, and nNumOfEmelment - 1. If the element is the last element, then nNumUsed - 1.
Insert:
Take $arr = ['a'=>1, 'b'=>2] as an example:
First put 1 into the array, its val.u2.next = -1, calculate the hash according to its subscript a, then take the hash modulo nTableMask to get an idx, and save the index nindex of the previous 1 at the position of the idx.
Store 2 again, its val.u2.next = -1. If the hash is calculated based on its subscript b and modulo nTableMask, the idx obtained already has a value, then it means that a hash collision has occurred. At this time, The value in the current idx is taken out and saved to the current val.u2.next, and the index nindex of 2 is saved in the current idx, and so on.
Search:
Calculate the hash based on the subscript a, take the modulo nTableMask, get an idx, get the value nindex in the idx, and search in arData. If the location is found The key in != a, then it cannot be found; if the key in the found position == a, then check its u2.next. If it is -1, then it is found; if it is not -1, it means that it is in the process of insertion. If a hash conflict occurs, continue searching in arData according to u2.next until it is found.
#rehash:
When rehash, first reset all records in the nindex area to -1, and then move the pointer *p starting from the first element. If the element is not marked as is_undef, then recalculate the key hash of the element and put it into nindex, and then loop, p. If the element is marked as is_undef, then continue to move the pointer p and set a new pointer j to point to the position. Continue the loop and move the elements that are not is_undef to the front one by one. Each time p moves, j encounters is_undef. It will not move until it is assigned a value. Move all the way to the last nNunUsed, then assign j to nNunUsed, and then insert elements from this position, and the previous elements will be directly overwritten.
The above is the detailed content of Introduction to the basic structure of php arrays. For more information, please follow other related articles on the PHP Chinese website!