php-perl hash algorithm implementation (times33 hash algorithm)_PHP tutorial

WBOY
Release: 2016-07-13 10:42:47
Original
991 people have browsed it

复制代码 代码如下:

APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
                                                      apr_ssize_t *klen)
{
    unsigned int hash = 0;
    const unsigned char *key = (const unsigned char *)char_key;
    const unsigned char *p;
    apr_ssize_t i;

    /*
     * This is the popular `times 33' hash algorithm which is used by
     * perl and also appears in Berkeley DB. This is one of the best
     * known hash functions for strings because it is both computed
     * very fast and distributes very well.
     *
     * The originator may be Dan Bernstein but the code in Berkeley DB
     * cites Chris Torek as the source. The best citation I have found
     * is "Chris Torek, Hash function for text in C, Usenet message
     * <27038@mimsy.umd.edu> in comp.lang.c , October, 1990." in Rich
     * Salz's USENIX 1992 paper about INN which can be found at
     * .
     *
     * The magic of number 33, i.e. why it works better than many other
     * constants, prime or not, has never been adequately explained by
     * anyone. So I try an explanation: if one experimentally tests all
     * multipliers between 1 and 256 (as I did while writing a low-level
     * data structure library some time ago) one detects that even
     * numbers are not useable at all. The remaining 128 odd numbers
     * (except for the number 1) work more or less all equally well.
     * They all distribute in an acceptable way and this way fill a hash
     * table with an average percent of approx. 86%.
     *
     * If one compares the chi^2 values of the variants (see
     * Bob Jenkins ``Hashing Frequently Asked Questions'' at
     * http://burtleburtle.net/bob/hash/hashfaq.html for a description
     * of chi^2), the number 33 not even has the best value. But the
     * number 33 and a few other equally good numbers like 17, 31, 63,
     * 127 and 129 have nevertheless a great advantage to the remaining
     * numbers in the large set of possible multipliers: their multiply
     * operation can be replaced by a faster operation based on just one
     * shift plus either a single addition or subtraction operation. And
     * because a hash function has to both distribute good _and_ has to
     * be very fast to compute, those few numbers should be preferred.
     *
     *                  -- Ralf S. Engelschall
     */

    if (*klen == APR_HASH_KEY_STRING) {
        for (p = key; *p; p++) {
            hash = hash * 33 + *p;
        }
        *klen = p - key;
    }
    else {
        for (p = key, i = *klen; i; i--, p++) {
            hash = hash * 33 + *p;
        }
    }
    return hash;
}

Translation of the function comment part: This is the famous times33 hash algorithm. This algorithm is adopted by the perl language and appears in Berkeley DB. It is the best known hash algorithm One, when processing hashes with strings as key values, it has extremely fast calculation efficiency and good hash distribution. The first person to propose this algorithm was Dan Bernstein, but the source code was actually published by Clris Torek in Berkeley DB. The best citation I can find says this: "Chris Torek, Text Hash Functions in C, Usenet News <<27038@mimsy.umd.edu> in comp.lang.c, October 1990." "Mentioned in Rich Salz's 1992 article discussing INN in the USENIX newspaper. This article can be found on. 33 is a wonderful number, why does it work better than other values? Regardless of whether it is important or not , but no one has ever been able to fully explain why. So here, I will try to explain it. If someone tries to test every number between 1 and 256 (like an underlying data structure I wrote some time ago library), he will find that no number's performance is particularly outstanding. The performance of the 128 odd numbers (except 1) is similar, and they can all achieve an acceptable hash distribution, and the average distribution rate is about 86 %. If you compare the variance values ​​(gibbon: statistical term, indicating the average deviation between a random variable and its mathematical expectation) among these 128 odd numbers (see Bob Jenkins's http: //burtleburtle.net/bob/hash/hashfaq.html, the description of the squared difference), the number 33 is not the best one. (gibbon: According to my understanding here, as usual, the smaller the variance, the more stable it is. , but since it is not clear here the calculation formula of the author's variance, and whether the larger the dispersion is, the better in the hash discrete table, so it is not clear whether the good performance here refers to a large variance value or a small variance value), but The number 33 and some other equally good numbers like 17, 31, 63, 127 and 129 still have a big advantage over the rest of the numbers when faced with a lot of hash operations, which is that these numbers can use bitwise multiplication By replacing it with addition and subtraction, the operation speed will be improved. After all, a good hash algorithm requires both good distribution and high calculation speed. There are very few numbers that can achieve both of these points at the same time.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/633589.htmlTechArticleCopy code The code is as follows: APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key, apr_ssize_t *klen) { unsigned int hash = 0; const unsigned char *key = (const...
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template