详解PHP中Array构造HashTable -PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

详解PHP中Array构造HashTable

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 13, 2016 pm 01:08 PM

gt hash hashtable null

详解PHP中Array结构HashTable

我们知道PHP中的Array在内部是以Hash的结构进行存储的。本文主要重点也是对PHP中Array的静态结构和动态结构进行分析和记录。

这里的静态结构，是指存储PHP中Array数据时使用的数据结构，即所谓的HashTable。

动态结构，是指程序在运行过程中，Array数据的存储状态。

首先PHP中的hashTable的结构如下：

typedef struct bucket {
    ulong h;                        /* Used for numeric indexing */
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    char *arKey;
} Bucket;


typedef struct _hashtable {
    uint nTableSize;
    uint nTableMask;
    uint nNumOfElements;
    ulong nNextFreeElement;
    Bucket *pInternalPointer;   /* Used for element traversal */
    Bucket *pListHead;
    Bucket *pListTail;
    Bucket **arBuckets; ? ? ? ? ?
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount;
    zend_bool bApplyProtection;
#if ZEND_DEBUG
    int inconsistent;
#endif
} HashTable;

Copy after login

一个PHP中的Array在内部对应一个HashTable,HashTable内部的四个Bucket类型的指针数据记录着数组实际存储的元素内容的地址。具体的内容，各字段名都可以自解释，不做多说明了。

如果只看这几行代码，可能无法理解PHP数组实际的工作原理，接下来，我们可以手工模拟一下PHP数组中的一些最简单的操作。

1. 从无到有

HashTable的初始化，首先需要给一个HashTable构造一个内存空间，具体代码如下：

//hash_func_t在函数内用不到，hash函数在PHP范围内都是固定的
int _zend_hash_init(HashTable *ht, uint nSize, hash_func_t pHashFunction, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC)
{
    uint i = 3;

    SET_INCONSISTENT(HT_OK);

    if (nSize >= 0x80000000) {
        /* prevent overflow */
        ht->nTableSize = 0x80000000;
    } else {
        while ((1U nTableSize = 1 nTableMask = 0; /* 0 means that ht->arBuckets is uninitialized */
    ht->pDestructor = pDestructor;
    ht->arBuckets = (Bucket**)&uninitialized_bucket; ? //实际的数据存储空间还未创建
    ht->pListHead = NULL;
    ht->pListTail = NULL;
    ht->nNumOfElements = 0; ? ? ? ? ? ? ? ? ? //表示数组内还没有一个元素，
    ht->nNextFreeElement = 0;
    ht->pInternalPointer = NULL;
    ht->persistent = persistent;
    ht->nApplyCount = 0;
    ht->bApplyProtection = 1;
    return SUCCESS;
}

Copy after login

上述代码可以理解为，为数组构造了一个总的大门，数据都可以经由这个门进入到自己对应的内存块中。当然现在门里还没有“座位”呢。

2. 数据插入

对于一个一无所有的空间，怎么给它加点东西呢？这就是数据的插入，即数据是如何保存到这个HashTable中的。

PHP的数组索引可以是数值或字符串，我们首先看字符串的索引如何存储，代码如下：

int _zend_hash_add_or_update(HashTable *ht, const char *arKey, uint nKeyLength, void *pData, uint nDataSize, void **pDest, int flag ZEND_FILE_LINE_DC)
{
	ulong h;
	uint nIndex;
	Bucket *p;

	IS_CONSISTENT(ht);

	if (nKeyLength nTableMask;

	p = ht->arBuckets[nIndex];
	while (p != NULL) {
		if (p->arKey == arKey ||
			((p->h == h) && (p->nKeyLength == nKeyLength) && !memcmp(p->arKey, arKey, nKeyLength))) {
				if (flag & HASH_ADD) {
					return FAILURE;
				}
				HANDLE_BLOCK_INTERRUPTIONS();
#if ZEND_DEBUG
				if (p->pData == pData) {
					ZEND_PUTS("Fatal error in zend_hash_update: p->pData == pData\n");
					HANDLE_UNBLOCK_INTERRUPTIONS();
					return FAILURE;
				}
#endif
				if (ht->pDestructor) {
					ht->pDestructor(p->pData);
				}
				UPDATE_DATA(ht, p, pData, nDataSize);
				if (pDest) {
					*pDest = p->pData;
				}
				HANDLE_UNBLOCK_INTERRUPTIONS();
				return SUCCESS; ?//更新之后直接退出
		}
		p = p->pNext;
	}
	
	if (IS_INTERNED(arKey)) {
		p = (Bucket *) pemalloc(sizeof(Bucket), ht->persistent);
		if (!p) {
			return FAILURE;
		}
		p->arKey = (char*)arKey;
	} else {
		p = (Bucket *) pemalloc(sizeof(Bucket) + nKeyLength, ht->persistent);
		if (!p) {
			return FAILURE;
		}
		p->arKey = (char*)(p + 1);
		memcpy(p->arKey, arKey, nKeyLength);
	}
	p->nKeyLength = nKeyLength;
	INIT_DATA(ht, p, pData, nDataSize);
	p->h = h;
	CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[nIndex]);
	if (pDest) {
		*pDest = p->pData;
	}

	HANDLE_BLOCK_INTERRUPTIONS();
	CONNECT_TO_GLOBAL_DLLIST(p, ht);
	ht->arBuckets[nIndex] = p;
	HANDLE_UNBLOCK_INTERRUPTIONS();

	ht->nNumOfElements++;
	ZEND_HASH_IF_FULL_DO_RESIZE(ht);		/* If the Hash table is full, resize it */
	return SUCCESS;
}

Copy after login

首先，检查数组空间是否初始化，代码如下：

#define CHECK_INIT(ht) do {                                             \
    if (UNEXPECTED((ht)->nTableMask == 0)) {                                \
        (ht)->arBuckets = (Bucket **) pecalloc((ht)->nTableSize, sizeof(Bucket *), (ht)->persistent);   \
        (ht)->nTableMask = (ht)->nTableSize - 1;                        \
    }                                                                   \
} while (0)

Copy after login

然后计算要插入的字符串索引的hash值，并与nTableMask做按位与，得到nindex，这个nIndex就是对应的bucket*在二维数组arBucket**中的偏移量。根据代码逻辑，如果nIndex位置不为空，则说明当前计算得到的hash值之前存在。如果连key也相同并且flag为HASH_ADD则失败，否则就是更新操作。如果是更新操作则不会对现有数组结构有任何影响，更新了对应的值之后直接退出即可。

在需要有新元素插入到HashTable时，构造好的新元素会经过两步来链入该HashTable

第一步代码如下：

#define CONNECT_TO_BUCKET_DLLIST(element, list_head)        \
    (element)->pNext = (list_head);                         \
    (element)->pLast = NULL;                                \
    if ((element)->pNext) {                                 \
        (element)->pNext->pLast = (element);                \
    }

Copy after login

在这一步中如果新元素的key的hash值之前存在过，则list_head为HashTable.arBucket[nIndex],nIndex怎么来的前面已经说过了。在这一步过后会将HashTable.arBucket[nIndex]赋值为当前的新元素，你懂得。

如果新元素的key对应的hash之前没有存在过，则list_head就为NULL，因为HashTable.arBucket[nIndex]为NULL。你也懂得。

第二步代码如下：

#define CONNECT_TO_GLOBAL_DLLIST(element, ht)               \
    (element)->pListLast = (ht)->pListTail;                 \
    (ht)->pListTail = (element);                            \
    (element)->pListNext = NULL;                            \
    if ((element)->pListLast != NULL) {                     \
        (element)->pListLast->pListNext = (element);        \
    }                                                       \
    if (!(ht)->pListHead) {                                 \
        (ht)->pListHead = (element);                        \
    }                                                       \
    if ((ht)->pInternalPointer == NULL) {                   \
        (ht)->pInternalPointer = (element);                 \
    }

Copy after login

关于这一步会对HashTable的内容有什么样的影响，请参看下面的动态示例。相信你也懂得。

动态示例：

现在我们假设数组中没有任何元素，则进行插入操作。现在我们按照代码的逻辑，手动模拟一下数据插入的过程：

插入第一个元素A,假设其key对应的hash值为1

则插入之后，内存中的状态如下：

HashTable.arBucket[1]=A;

HashTable.pListHead = A

HashTable.pListTail = A

HashTable.pInternalPointer = A

A.pNext = null

A.pLast = null

A.pListLast = null

A.pListNext = null

插入第二个元素B，假设其key对应的hash值为2

则插入之后内存的状态如下：

HashTable.arBucket[2] = B;

HashTable.pListHead = A

HashTable.pListTail = B

HashTable.pInternalPointer = A?????? //这个只在第一次的时候设置

A.pNext=null

A.pLast = null

A.pListNext = B

A.pListLast = null

B.pListLast = A

B.pListNext = null

B.pNext = null

B.pLast = null

插入第三个元素C，假设其key的hash值为1，和A相同

则插入之后内存状态如下：

HashTable.arBucket[1] = C;

HashTable.pListHead = A

HashTable.pListTail =C

HashTable.pInternalPointer = A?????? //这个只在第一次的时候设置

A.pNext=null

A.pLast = C

A.pListNext = B

A.pListLast = null

B.pNext = null

B.pLast = null

B.pListLast = A

B.pListNext = C

C.pNext = A

C.pLast = null

C.pListNext = null

C.pListLast = B

插入A，B，C三个值之后的内存中状态即为：

HashTable.arBucket[1] = C;

HashTable.pListHead = A

HashTable.pListTail =C

HashTable.pInternalPointer = A

A.pNext=null

A.pLast = C

A.pListNext = B

A.pListLast = null

B.pNext = null

B.pLast = null

B.pListLast = A

B.pListNext = C

C.pNext = A

C.pLast = null

C.pListNext = null

C.pListLast = B

OK，A、B、C三个元素都已插入了，现在我们要实现两个任务：

查找某key的元素值(value)：

如果我们要访问A元素，则提供A的key：key_a,得到对应的hash值为1

然后找HastTable.arBucket[1]。这时HastTable.arBucket[1]其实为C不是A，但由于C的key不等于A的key，因此，要沿着pNext的指针找下去，直到NULL，而此时C.pNext就是A，即找到了key_a对应的值A。

总之由key查找一个元素时，首先要hash，然后顺着hash后的索引位置的pNext指针一直找下去，直到NULL，如果遇到了和要查找的key相同的值，则找到，否则找不到。

遍历数组：

由于我们的例子中的key是字符串类型的，全部循环遍历不能用for。只能用foreach，那foreach的遍历是如何实现的呢？

简单，根据最后的HashTable的状态，我们从HastTable.pListHead开始沿着pListNext指针顺序找下去即可了。以本文例子为例，则结果为：

HashTable.pListHead====>A

A.pListNext?????????????????? ====>B

B.pListNext?????????????????? ====>C

则最后的遍历顺序就是A，B，C，发现foreach的遍历顺序是和元素插入到数组的顺序相关的。

如果插入的元素的key不是字符串，而是数值。则可以省去做计算hash值这一步，直接拿数值的key做为hash值使用。

这样就不存在hash冲突的问题，这样也就不会用到每个元素的pNext、pLast两个指针了，这两个指针都只会是NULL。

这样我们可以通过使用for循环来遍历数组了，因为不存在hash冲突。

同样，如果我们使用foreach来遍历数组的话，遍历顺序还是元素的插入顺序，这个你当然懂得。

ps：

本文并未对zend中的hash结够做全面的记录，只是对本文主题涉及到的逻辑的重点代码进行了分析和演示。同时也为了能抓住重点。有些代码并未列出，如：再hash的逻辑，和索引为数值类型数据的代码等。这些可在代码文件Zend/zend_hash.c中找到详细内容。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

4 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks ago By DDD

Two Point Museum: All Exhibits And Where To Find Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7375

Java Tutorial

1628

CakePHP Tutorial

1355

Laravel Tutorial

1267

PHP Tutorial

1216

Related knowledge

What are the differences between Huawei GT3 Pro and GT4? Dec 29, 2023 pm 02:27 PM

Many users will choose the Huawei brand when choosing smart watches. Among them, Huawei GT3pro and GT4 are very popular choices. Many users are curious about the difference between Huawei GT3pro and GT4. Let’s introduce the two to you. . What are the differences between Huawei GT3pro and GT4? 1. Appearance GT4: 46mm and 41mm, the material is glass mirror + stainless steel body + high-resolution fiber back shell. GT3pro: 46.6mm and 42.9mm, the material is sapphire glass + titanium body/ceramic body + ceramic back shell 2. Healthy GT4: Using the latest Huawei Truseen5.5+ algorithm, the results will be more accurate. GT3pro: Added ECG electrocardiogram and blood vessel and safety

Fix: Snipping tool not working in Windows 11 Aug 24, 2023 am 09:48 AM

Why Snipping Tool Not Working on Windows 11 Understanding the root cause of the problem can help find the right solution. Here are the top reasons why the Snipping Tool might not be working properly: Focus Assistant is On: This prevents the Snipping Tool from opening. Corrupted application: If the snipping tool crashes on launch, it might be corrupted. Outdated graphics drivers: Incompatible drivers may interfere with the snipping tool. Interference from other applications: Other running applications may conflict with the Snipping Tool. Certificate has expired: An error during the upgrade process may cause this issu simple solution. These are suitable for most users and do not require any special technical knowledge. 1. Update Windows and Microsoft Store apps

How to implement Redis Hash operation in php May 30, 2023 am 08:58 AM

Hash operation //Assign values to fields in the hash table. Returns 1 on success and 0 on failure. If the hash table does not exist, the table will be created first and then the value will be assigned. If the field already exists, the old value will be overwritten. $ret=$redis->hSet('user','realname','jetwu');//Get the value of the specified field in the hash table. If the hash table does not exist, return false. $ret=$redis->hGet('user','rea

How to Fix Can't Connect to App Store Error on iPhone Jul 29, 2023 am 08:22 AM

Part 1: Initial Troubleshooting Steps Checking Apple’s System Status: Before delving into complex solutions, let’s start with the basics. The problem may not lie with your device; Apple's servers may be down. Visit Apple's System Status page to see if the AppStore is working properly. If there's a problem, all you can do is wait for Apple to fix it. Check your internet connection: Make sure you have a stable internet connection as the "Unable to connect to AppStore" issue can sometimes be attributed to a poor connection. Try switching between Wi-Fi and mobile data or resetting network settings (General > Reset > Reset Network Settings > Settings). Update your iOS version:

Laravel development: How to generate password hash using Laravel Hash? Jun 17, 2023 am 10:59 AM

Laravel is currently one of the most popular PHP web frameworks, providing developers with many powerful features and components, among which LaravelHash is one of them. LaravelHash is a PHP library for password hashing that can be used to keep passwords secure and make your application's user data more secure. In this article, we will learn how LaravelHash works and how to use it to hash and verify passwords. Prerequisite knowledge in learning Lara

What is the difference between null and NULL in c language Sep 22, 2023 am 11:48 AM

The difference between null and NULL in C language is: null is a macro definition in C language, usually used to represent a null pointer, which can be used to initialize pointer variables, or to determine whether the pointer is null in a conditional statement; NULL is a macro definition in C language A predefined constant in , usually used to represent a null value, used to represent a null pointer, null pointer array or null structure pointer.

Using the isEmpty() method of the Hashtable class in Java to determine whether the hash table is empty Jul 24, 2023 pm 02:21 PM

In Java, the isEmpty() method of the Hashtable class is used to determine whether the hash table is empty. The hash table is one of the commonly used data structures in the Java collection framework. It implements the storage and retrieval of key-value pairs. In the Hashtable class, the isEmpty() method is used to determine whether the hash table is empty. This article will introduce how to use the isEmpty() method of the Hashtable class and provide corresponding code examples. First, we need to understand the Hashtable class. Hash

What do undefined and null mean? Nov 20, 2023 pm 02:39 PM

In JavaScript, both undefined and null represent the concept of "nothing": 1. undefined represents an uninitialized variable or a non-existent property. When a variable is declared but no value is assigned to it, the value of the variable is undefined , when accessing properties that do not exist in the object, the returned value is also undefined; 2. null represents an empty object reference. In some cases, the object reference can be set to null to release the memory it occupies.

See all articles