Hash Tables : Collisions, Resizing, Hashing-JS Tutorial-php.cn

Assumes understanding of Big O notation. Examples are in JavaScript. Information references "Cracking the Coding Interview" by Gayle Laakmann McDowell

Understanding Hash Tables

Whether you've heard about dictionaries, or hash maps, or hash tables, they are all essentially the same. In this blog, we'll reference this data structure as a hash table for simplicity's sake. Let's start with defining what a hash table is. A hash table is a data structure that maps keys to values in the form of key-value pairs for highly efficient lookup. There are multiple ways to implement it.

Key Components of a Hash Table

Using an array of linked lists and a hashing function we can implement a hash table. Let's dive deeper into what a hashing function is.

What is a Hashing Function?

A hashing function is a crucial component of a hash table. It's an algorithm, typically in the form of a function, that takes an input (or 'key') and returns a fixed-size string of bytes, typically in the form of an integer. The output is called a hash code or simply a hash. The main purpose of a hashing function in the context of hash tables is to map a hash code to a valid index of an array of buckets/slots, from which the desired value can be found. These buckets/slots would be linked lists in our case. Characteristics of a Good Hashing Function:

Deterministic: For a given input, it should always produce the same hash output.
Uniform Distribution: It should map the expected inputs as evenly as possible over its output range.
Efficient: It should be quick to compute.
Avalanche Effect: A small change in the input should result in a significantly different hash output.

Why Use an Array of Linked Lists?

The use of an array of linked lists in hash table implementation is a common technique known as chaining. This approach offers several advantages:

Collision Handling: The primary reason for using an array of linked lists is to handle collisions efficiently. When two keys hash and map to the same index, we can simply add the new key-value pair to the linked list at that index.
Space Efficiency: It allows the hash table to store more items than the size of the underlying array. Each array slot can hold multiple items through its linked list.

Array:  [0] -> (key1, value1) -> (key2, value2)
        [1] -> (key3, value3)
        [2] -> (key4, value4) -> (key5, value5) -> (key6, value6)
        [3] -> Empty
        [4] -> (key7, value7)

Copy after login

In this example, keys 1 and 2 hash to index 0, while keys 4, 5, and 6 all hash to index 2.

Inserting Key-Value Pairs

Now that we have a good understanding of hash functions and why we use chaining, let's go through the flow of inserting key-value pairs into a hash table:

When inserting the key (any value), we first compute the key’s hash code (usually an int or long). It’s possible that two different keys could have the same hash code since there may be an infinite # of keys and finite # of ints.
Map the hash code to an index in the array. A common method for mapping a hash code to an array is using the modulus operator. (e.g., hash(key) % array.length)). It's possible that two different hash codes could map to the same index with this method.
At an index, there is a linked lists of keys and values. Store the key-value pair at this index. Collisions occur when keys have the same hash codes or hash codes map to the same indices.

Accessing Key-Value Pairs

Accessing key-value pairs is highly efficient in a hash table implementation. Simply compute the hash code from the key, then compute the index from the hash code, and finally search through the linked list for the value with this key. Assuming a good implementation, accessing key-value pairs (insertion and deletion too), takes

O (1) O(1)

What Makes a Hash Table Implementation "Good"?

A well-implemented hash table should balance efficiency, space utilization, and collision handling. Here are the key factors that contribute to a good hash table implementation:

A Good Hash Function

The heart of any hash table is its hash function. A good hash function should:

Be quick to compute
Minimize collisions

Optimal Load Factor

The load factor is the ratio of filled slots to total slots in the hash table. Maintaining an appropriate load factor is crucial:

A typical sweet spot is between 0.6 and 0.75

Too low (< 0.5): Wastes space
Too high (> 0.8): Increases collision risk

Collision Resolution Techniques

Two primary methods for handling collisions are:

Chaining: Each table position stores a linked list of collided items. Simple to implement but can lead to slower lookups if chains become long.
Open Addressing: If a collision occurs, look for the next available slot. Keeps all data in the table but requires careful implementation to avoid clustering of stored data.

Note that chaining and open-addressing cannot coexist easily. Logically, it would not make sense to look for the next available slot but store collided items at a specific index.

Dynamic Resizing

As the number of elements grows, the hash table should resize to maintain performance:

Typically, the table size is doubled when the load factor exceeds a threshold. All elements need to be rehashed into the new, larger table.

This operation is expensive but infrequent, keeping the amortized time complexity at O(1).

JavaScript Implementation

This implementation will utilize resizing and chaining for collision resolution. We will assume that our keys are integers.

For the hash function + mapping, we will keep it very simple and simply perform the following given a key:

k e y % a r r a y c a p a c i t y key \hspace{0.2cm} \% \hspace{0.2cm} array \hspace{0.1cm} capacity

Classical OOP

class HashNode {
    constructor(key, value) {
        this.key = key;
        this.value = value;
        this.next = null;
    }
}

class HashTable {
    constructor(capacity = 16) {
        this.capacity = capacity;
        this.size = 0;
        this.buckets = new Array(this.capacity).fill(null);
        this.threshold = 0.75;
    }

    hash(key) {
        return key % this.capacity;
    }

    insert(key, value) {
        const index = this.hash(key);
        if (!this.buckets[index]) {
            this.buckets[index] = new HashNode(key, value);
            this.size++;
        } else {
            let currentNode = this.buckets[index];
            while (currentNode.next) {
                if (currentNode.key === key) {
                    currentNode.value = value;
                    return;
                }
                currentNode = currentNode.next;
            }
            if (currentNode.key === key) {
                currentNode.value = value;
            } else {
                currentNode.next = new HashNode(key, value);
                this.size++;
            }
        }

        if (this.size / this.capacity >= this.threshold) {
            this.resize();
        }
    }

    get(key) {
        const index = this.hash(key);
        let currentNode = this.buckets[index];
        while (currentNode) {
            if (currentNode.key === key) {
                return currentNode.value;
            }
            currentNode = currentNode.next;
        }
        return undefined;
    }

    remove(key) {
        const index = this.hash(key);
        if (!this.buckets[index]) {
            return false;
        }
        if (this.buckets[index].key === key) {
            this.buckets[index] = this.buckets[index].next;
            this.size--;
            return true;
        }
        let currentNode = this.buckets[index];
        while (currentNode.next) {
            if (currentNode.next.key === key) {
                currentNode.next = currentNode.next.next;
                this.size--;
                return true;
            }
            currentNode = currentNode.next;
        }
        return false;
    }

    resize() {
        const newCapacity = this.capacity * 2;
        const newBuckets = new Array(newCapacity).fill(null);
        this.buckets.forEach(head => {
            while (head) {
                const newIndex = head.key % newCapacity;
                const next = head.next;
                head.next = newBuckets[newIndex];
                newBuckets[newIndex] = head;
                head = next;
            }
        });
        this.buckets = newBuckets;
        this.capacity = newCapacity;
    }

    getSize() {
        return this.size;
    }

    getCapacity() {
        return this.capacity;
    }
}

Copy after login

Functional OOP

function createHashTable(initialCapacity = 16) {
    let capacity = initialCapacity;
    let size = 0;
    let buckets = new Array(capacity).fill(null);
    const threshold = 0.75;

    function hash(key) {
        return key % capacity;
    }

    function resize() {
        const newCapacity = capacity * 2;
        const newBuckets = new Array(newCapacity).fill(null);
        buckets.forEach(function(head) {
            while (head) {
                const newIndex = head.key % newCapacity;
                const next = head.next;
                head.next = newBuckets[newIndex];
                newBuckets[newIndex] = head;
                head = next;
            }
        });
        buckets = newBuckets;
        capacity = newCapacity;
    }

    return {
        insert: function(key, value) {
            const index = hash(key);
            const newNode = { key, value, next: null };

            if (!buckets[index]) {
                buckets[index] = newNode;
                size++;
            } else {
                let currentNode = buckets[index];
                while (currentNode.next) {
                    if (currentNode.key === key) {
                        currentNode.value = value;
                        return;
                    }
                    currentNode = currentNode.next;
                }
                if (currentNode.key === key) {
                    currentNode.value = value;
                } else {
                    currentNode.next = newNode;
                    size++;
                }
            }

            if (size / capacity >= threshold) {
                resize();
            }
        },

        get: function(key) {
            const index = hash(key);
            let currentNode = buckets[index];
            while (currentNode) {
                if (currentNode.key === key) {
                    return currentNode.value;
                }
                currentNode = currentNode.next;
            }
            return undefined;
        },

        remove: function(key) {
            const index = hash(key);
            if (!buckets[index]) {
                return false;
            }
            if (buckets[index].key === key) {
                buckets[index] = buckets[index].next;
                size--;
                return true;
            }
            let currentNode = buckets[index];
            while (currentNode.next) {
                if (currentNode.next.key === key) {
                    currentNode.next = currentNode.next.next;
                    size--;
                    return true;
                }
                currentNode = currentNode.next;
            }
            return false;
        },

        getSize: function() {
            return size;
        },

        getCapacity: function() {
            return capacity;
        }
    };
}

Copy after login

The above is the detailed content of Hash Tables : Collisions, Resizing, Hashing. For more information, please follow other related articles on the PHP Chinese website!