PHP与Python实现Hash比较(一)
PHP中的array,python中的dict都是通过hash表(哈希表或散列表)实现的,或者说array与dict本身就是hash结构,本文及后续文章将分别比较PHP与python源代码中对哈希表的实现算法,一来学习其设计思想,另外可用于避免开发过程中一些可能会降低效率或易引发bug的
PHP中的array,python中的dict都是通过hash表(哈希表或散列表)实现的,或者说array与dict本身就是hash结构,本文及后续文章将分别比较PHP与python源代码中对哈希表的实现算法,一来学习其设计思想,另外可用于避免开发过程中一些可能会降低效率或易引发bug的操作。
先来PHP。一切源于PHP的内置数据类型zval(见PHP_X_X/Zend/zend.h):
typedef union _zvalue_value { long lval; //long value double dval; //double value struct { char *val; int len; } str; HashTable *ht; //hash table value zend_object_value obj; } zvalue_value; struct _zval_struct { //Variable information zvalue_value value; //value zend_uint refcount_gc; zend_uchar type; //active type zend_uchar is_ref_gc; };
其中HashTable *ht即是PHP中用于表示Array类型的结构,在深究HashTable结构之前先了解哈希表的原理,在C语言中数组是通过自然数作为数组索引来存储数据的,而在PHP或python等这些语言中,哈希表是以key - value的方式存取的,要实现这一存储方式,则需要将任意可能的key对应或映射到数组或者内存的自然数序列索引上,即实现
index = hash(key)
hash()即为哈希函数。理想状态下的hash()可以将任意的key映射到均匀分布且不重叠的自然数集合中,但由于key的不确定性,这显然是不可能的,因而一个好的哈希函数应该可以尽可能地避免重叠或碰撞(collisions),而在PHP中实现这一功能的哈希函数采纳的是DJBX33A算法。源码中实现代码如下:
static inline ulong zend_inline_hash_func(const char *arKey, uint nKeyLength) { register ulong hash = 5381; /* variant with the hash unrolled eight times */ for (; nKeyLength >= 8; nKeyLength -= 8) { hash = ((hash <p>据其注释中所解释的来看,DJBX33A (Daniel J. Bernstein, Times 33 with Addition)算法可简单描述为</p> <blockquote> <p>hash(i) = hash(i-1) * 33 + str[i]</p> </blockquote> <p>至于为何取33而不是其它数,解释说是对1 ~ 256进行分别进行测试后择优选择的结果,并没有理论上的支撑,而且初始的hash值为5381应该也没有什么特别特别的特别之处吧?到这里为止,首先可以确定的一条规则就是,<span style="color: #3498db">在PHP中定义使用数组时key的长度以最好不要超过7为妙</span>,便可省掉第一步的for循环,因而在考虑效率的前提下,道长当年所说的为了增加代码的可读性将变量名定为几十个字符甚至一句话显然是不可取的咯:P</p> <p>通过巧妙的算法,hash碰撞得以减少,但是并没有完全避免(例如:PHP哈希表碰撞攻击原理),既然冲突是不可避免的,那就只能想办法解决冲突,算法书里面对冲突的处理方案有很多,PHP采用的是拉链法,具体实现方法还是要先追寻其定义(见PHP_X_X/Zend/zend_hash.h):</p> <p class="highlight"></p><pre class="brush:php;toolbar:false">typedef struct bucket { ulong h; //Used for numeric indexing uint nKeyLength; void *pData; void *pDataPtr; struct bucket *pListNext; struct bucket *pListLast; struct bucket *pNext; struct bucket *pLast; const char *arKey; } Bucket; typedef struct _hashtable { uint nTableSize; uint nTableMask; uint nNumOfElements; ulong nNextFreeElement; Bucket *pInternalPointer; //Used for element traversal Bucket *pListHead; Bucket *pListTail; Bucket **arBuckets; dtor_func_t pDestructor; zend_bool persistent; unsigned char nApplyCount; zend_bool bApplyProtection; #if ZEND_DEBUG int inconsistent; #endif } HashTable;
最终hash表的key保存在Bucket.arKey,key长为Bucket.nKeyLength,哈希函数计算得到的哈希值存为Bucket.h,当冲突时通过引出一条静态链表来解决,其实现如下:
ZEND_API int zend_hash_exists(const HashTable *ht, const char *arKey, uint nKeyLength) { ulong h; uint nIndex; Bucket *p; IS_CONSISTENT(ht); h = zend_inline_hash_func(arKey, nKeyLength); nIndex = h & ht->nTableMask; p = ht->arBuckets[nIndex]; while (p != NULL) { if (p->arKey == arKey || ((p->h == h) && (p->nKeyLength == nKeyLength) && !memcmp(p->arKey, arKey, nKeyLength))) { return 1; } p = p->pNext; } return 0; }
p = p->pNext即在已有元素之上开辟出新的位置存储冲突的下一个元素。至此,PHP中HashTable实现的基本思想就介绍完了,有空再把python的部分补上。
构建动态结构体的小trick
Bucket结构体的最后一个元素arKey被定义为char *arKey;也有看到char arKey[1],有人解释说利用变长结构体,加上有看到注释
char arKey[1]; /* Must be last element */
更是如坠云里雾里,还以为说 arKey 必须存放 HashTable 里面 key 字符串的最后一个字符…经过一番挣扎,发现原来不是这个意思,shit!(见what-is-your-favorite-c-programming-trick),所谓的变长结构体只是说在考虑到内存连续性条件下,为了实现结构体内部元素的动态分配,利用struct的性质,将需要动态分配的变量放在结构体最后,如此以来通过malloc动态分配给struct的内存超出结构体本身所需的部分sizeof(struct)
可以顺其自然地被最后一个元素所访问,从而实现了可变长的结构体,所以说,注释中的Must be last element不是说存放的是key的最后一个字符,而是必须放在结构体的最后一个元素!shit again(but a good trick:P)!
参考
- PHP哈希表结构的深入剖析
原文地址:PHP与Python实现Hash比较(一), 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

The speed of mobile XML to PDF depends on the following factors: the complexity of XML structure. Mobile hardware configuration conversion method (library, algorithm) code quality optimization methods (select efficient libraries, optimize algorithms, cache data, and utilize multi-threading). Overall, there is no absolute answer and it needs to be optimized according to the specific situation.

It is impossible to complete XML to PDF conversion directly on your phone with a single application. It is necessary to use cloud services, which can be achieved through two steps: 1. Convert XML to PDF in the cloud, 2. Access or download the converted PDF file on the mobile phone.

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

There is no built-in sum function in C language, so it needs to be written by yourself. Sum can be achieved by traversing the array and accumulating elements: Loop version: Sum is calculated using for loop and array length. Pointer version: Use pointers to point to array elements, and efficient summing is achieved through self-increment pointers. Dynamically allocate array version: Dynamically allocate arrays and manage memory yourself, ensuring that allocated memory is freed to prevent memory leaks.

An application that converts XML directly to PDF cannot be found because they are two fundamentally different formats. XML is used to store data, while PDF is used to display documents. To complete the transformation, you can use programming languages and libraries such as Python and ReportLab to parse XML data and generate PDF documents.

XML can be converted to images by using an XSLT converter or image library. XSLT Converter: Use an XSLT processor and stylesheet to convert XML to images. Image Library: Use libraries such as PIL or ImageMagick to create images from XML data, such as drawing shapes and text.
