Tables de hachage : collisions, redimensionnement, hachage-js tutoriel-php.cn

Hash Tables : Collisions, Resizing, Hashing

Suppose une compréhension de la notation Big O. Les exemples sont en JavaScript. Références d'informations "Cracking the Coding Interview" par Gayle Laakmann McDowell

Comprendre les tables de hachage

Que vous ayez entendu parler de dictionnaires, de cartes de hachage ou de tables de hachage, ils sont tous essentiellement les mêmes. Dans ce blog, nous référencerons cette structure de données sous forme de table de hachage par souci de simplicité. Commençons par définir ce qu'est une table de hachage. Une table de hachage est une structure de données qui mappe les clés aux valeurs sous la forme de paires clé-valeur pour une recherche très efficace. Il existe plusieurs façons de le mettre en œuvre.

Composants clés d'une table de hachage

En utilisant un tableau de listes chaînées et une fonction de hachage nous pouvons implémenter une table de hachage. Examinons plus en détail ce qu'est une fonction de hachage.

Qu'est-ce qu'une fonction de hachage ?

Une fonction de hachage est un élément crucial d'une table de hachage. Il s'agit d'un algorithme, généralement sous la forme d'une fonction, qui prend une entrée (ou « clé ») et renvoie une chaîne d'octets de taille fixe, généralement sous la forme d'un entier. La sortie est appelée un code de hachage ou simplement un hachage. L'objectif principal d'une fonction de hachage dans le contexte des tables de hachage est de mapper un code de hachage à un index valide d'un tableau de buckets/slots, à partir duquel la valeur souhaitée peut être trouvée. Ces compartiments/emplacements seraient des listes chaînées dans notre cas. Caractéristiques d'une bonne fonction de hachage :

Déterministe : Pour une entrée donnée, elle doit toujours produire la même sortie de hachage.
Distribution uniforme : il doit mapper les entrées attendues aussi uniformément que possible sur sa plage de sortie.
Efficace : Il doit être rapide à calculer.
Effet d'avalanche : un petit changement dans l'entrée devrait entraîner une sortie de hachage significativement différente.

Pourquoi utiliser un tableau de listes chaînées ?

L'utilisation d'un tableau de listes chaînées dans l'implémentation de tables de hachage est une technique courante connue sous le nom de chaînage. Cette approche offre plusieurs avantages :

Gestion des collisions : la principale raison d'utiliser un tableau de listes chaînées est de gérer efficacement les collisions. Lorsque deux clés sont hachées et mappées sur le même index, nous pouvons simplement ajouter la nouvelle paire clé-valeur à la liste chaînée à cet index.
Efficacité spatiale : Elle permet à la table de hachage de stocker plus d'éléments que la taille du tableau sous-jacent. Chaque emplacement de tableau peut contenir plusieurs éléments via sa liste chaînée.

Array:  [0] -> (key1, value1) -> (key2, value2)
        [1] -> (key3, value3)
        [2] -> (key4, value4) -> (key5, value5) -> (key6, value6)
        [3] -> Empty
        [4] -> (key7, value7)

Copier après la connexion

Dans cet exemple, les clés 1 et 2 sont hachées pour indexer 0, tandis que les clés 4, 5 et 6 sont toutes hachées pour indexer 2.

Insertion de paires clé-valeur

Maintenant que nous avons une bonne compréhension des fonctions de hachage et pourquoi nous utilisons le chaînage, passons en revue le processus d'insertion de paires clé-valeur dans une table de hachage :

Lors de l'insertion de la clé (n'importe quelle valeur), nous calculons d'abord le code de hachage de la clé (généralement un entier ou un long). Il est possible que deux clés différentes aient le même code de hachage puisqu'il peut y avoir un nombre infini de clés et un nombre fini d'ints.
Mappez le code de hachage sur un index du tableau. Une méthode courante pour mapper un code de hachage sur un tableau consiste à utiliser l'opérateur de module. (par exemple, hash(key) % array.length)). Il est possible que deux codes de hachage différents puissent correspondre au même index avec cette méthode.
Dans un index, il y a une liste chaînée de clés et de valeurs. Stockez la paire clé-valeur à cet index. Les collisions se produisent lorsque les clés ont les mêmes codes de hachage ou que les codes de hachage correspondent aux mêmes indices.

Accès aux paires clé-valeur

L'accès aux paires clé-valeur est très efficace dans une implémentation de table de hachage. Calculez simplement le code de hachage à partir de la clé, puis calculez l'index à partir du code de hachage et enfin recherchez dans la liste chaînée la valeur avec cette clé. En supposant une bonne implémentation, l'accès aux paires clé-valeur (insertion et suppression également) prend

O (1) O(1)

What Makes a Hash Table Implementation "Good"?

A well-implemented hash table should balance efficiency, space utilization, and collision handling. Here are the key factors that contribute to a good hash table implementation:

A Good Hash Function

The heart of any hash table is its hash function. A good hash function should:

Be quick to compute
Minimize collisions

Optimal Load Factor

The load factor is the ratio of filled slots to total slots in the hash table. Maintaining an appropriate load factor is crucial:

A typical sweet spot is between 0.6 and 0.75

Too low (< 0.5): Wastes space
Too high (> 0.8): Increases collision risk

Collision Resolution Techniques

Two primary methods for handling collisions are:

Chaining: Each table position stores a linked list of collided items. Simple to implement but can lead to slower lookups if chains become long.
Open Addressing: If a collision occurs, look for the next available slot. Keeps all data in the table but requires careful implementation to avoid clustering of stored data.

Note that chaining and open-addressing cannot coexist easily. Logically, it would not make sense to look for the next available slot but store collided items at a specific index.

Dynamic Resizing

As the number of elements grows, the hash table should resize to maintain performance:

Typically, the table size is doubled when the load factor exceeds a threshold. All elements need to be rehashed into the new, larger table.

This operation is expensive but infrequent, keeping the amortized time complexity at O(1).

JavaScript Implementation

This implementation will utilize resizing and chaining for collision resolution. We will assume that our keys are integers.

For the hash function + mapping, we will keep it very simple and simply perform the following given a key:

k e y % a r r a y c a p a c i t y key \hspace{0.2cm} \% \hspace{0.2cm} array \hspace{0.1cm} capacity

Classical OOP

class HashNode {
    constructor(key, value) {
        this.key = key;
        this.value = value;
        this.next = null;
    }
}

class HashTable {
    constructor(capacity = 16) {
        this.capacity = capacity;
        this.size = 0;
        this.buckets = new Array(this.capacity).fill(null);
        this.threshold = 0.75;
    }

    hash(key) {
        return key % this.capacity;
    }

    insert(key, value) {
        const index = this.hash(key);
        if (!this.buckets[index]) {
            this.buckets[index] = new HashNode(key, value);
            this.size++;
        } else {
            let currentNode = this.buckets[index];
            while (currentNode.next) {
                if (currentNode.key === key) {
                    currentNode.value = value;
                    return;
                }
                currentNode = currentNode.next;
            }
            if (currentNode.key === key) {
                currentNode.value = value;
            } else {
                currentNode.next = new HashNode(key, value);
                this.size++;
            }
        }

        if (this.size / this.capacity >= this.threshold) {
            this.resize();
        }
    }

    get(key) {
        const index = this.hash(key);
        let currentNode = this.buckets[index];
        while (currentNode) {
            if (currentNode.key === key) {
                return currentNode.value;
            }
            currentNode = currentNode.next;
        }
        return undefined;
    }

    remove(key) {
        const index = this.hash(key);
        if (!this.buckets[index]) {
            return false;
        }
        if (this.buckets[index].key === key) {
            this.buckets[index] = this.buckets[index].next;
            this.size--;
            return true;
        }
        let currentNode = this.buckets[index];
        while (currentNode.next) {
            if (currentNode.next.key === key) {
                currentNode.next = currentNode.next.next;
                this.size--;
                return true;
            }
            currentNode = currentNode.next;
        }
        return false;
    }

    resize() {
        const newCapacity = this.capacity * 2;
        const newBuckets = new Array(newCapacity).fill(null);
        this.buckets.forEach(head => {
            while (head) {
                const newIndex = head.key % newCapacity;
                const next = head.next;
                head.next = newBuckets[newIndex];
                newBuckets[newIndex] = head;
                head = next;
            }
        });
        this.buckets = newBuckets;
        this.capacity = newCapacity;
    }

    getSize() {
        return this.size;
    }

    getCapacity() {
        return this.capacity;
    }
}

Copier après la connexion

Functional OOP

function createHashTable(initialCapacity = 16) {
    let capacity = initialCapacity;
    let size = 0;
    let buckets = new Array(capacity).fill(null);
    const threshold = 0.75;

    function hash(key) {
        return key % capacity;
    }

    function resize() {
        const newCapacity = capacity * 2;
        const newBuckets = new Array(newCapacity).fill(null);
        buckets.forEach(function(head) {
            while (head) {
                const newIndex = head.key % newCapacity;
                const next = head.next;
                head.next = newBuckets[newIndex];
                newBuckets[newIndex] = head;
                head = next;
            }
        });
        buckets = newBuckets;
        capacity = newCapacity;
    }

    return {
        insert: function(key, value) {
            const index = hash(key);
            const newNode = { key, value, next: null };

            if (!buckets[index]) {
                buckets[index] = newNode;
                size++;
            } else {
                let currentNode = buckets[index];
                while (currentNode.next) {
                    if (currentNode.key === key) {
                        currentNode.value = value;
                        return;
                    }
                    currentNode = currentNode.next;
                }
                if (currentNode.key === key) {
                    currentNode.value = value;
                } else {
                    currentNode.next = newNode;
                    size++;
                }
            }

            if (size / capacity >= threshold) {
                resize();
            }
        },

        get: function(key) {
            const index = hash(key);
            let currentNode = buckets[index];
            while (currentNode) {
                if (currentNode.key === key) {
                    return currentNode.value;
                }
                currentNode = currentNode.next;
            }
            return undefined;
        },

        remove: function(key) {
            const index = hash(key);
            if (!buckets[index]) {
                return false;
            }
            if (buckets[index].key === key) {
                buckets[index] = buckets[index].next;
                size--;
                return true;
            }
            let currentNode = buckets[index];
            while (currentNode.next) {
                if (currentNode.next.key === key) {
                    currentNode.next = currentNode.next.next;
                    size--;
                    return true;
                }
                currentNode = currentNode.next;
            }
            return false;
        },

        getSize: function() {
            return size;
        },

        getCapacity: function() {
            return capacity;
        }
    };
}

Copier après la connexion

Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!