Analysis of the implementation principle of HashMap in Java-javaTutorial-php.cn

Home

Java

javaTutorial

Analysis of the implementation principle of HashMap in Java

不言

Sep 11, 2018 pm 01:57 PM

hashmap

The content of this article is about the analysis of the implementation principle of HashMap in Java. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

1. HashMap overview:
HashMap is an asynchronous implementation of the Map interface based on hash tables. This implementation provides all optional mapping operations and allows null values and null keys. This class does not guarantee the order of the mapping, and in particular it does not guarantee that the order is immutable.
2. HashMap data structure:
In the Java programming language, there are two most basic structures, one is an array, and the other is a simulated pointer (reference). All data structures can be constructed using these two basic structures, and HashMap is no exception. HashMap is actually a "linked list hash" data structure, which is a combination of an array and a linked list.
As can be seen from the above figure, the bottom layer of HashMap is an array structure, and each item in the array is a linked list. When a new HashMap is created, an array will be initialized.

/** 
 * The table, resized as necessary. Length MUST Always be a power of two. 
 */  transient Entry[] table;  

static class Entry<K,V> implements Map.Entry<K,V> {  
    final K key;  
    V value;  
    Entry<K,V> next;  
    final int hash;  
    ……  
}

Copy after login

3. HashMap access implementation:
1) Storage:

public V put(K key, V value) {  
    // HashMap允许存放null键和null值。  
    // 当key为null时，调用putForNullKey方法，将value放置在数组第一个位置。  
    if (key == null)  
        return putForNullKey(value);  
    // 根据key的keyCode重新计算hash值。  
    int hash = hash(key.hashCode());  
    // 搜索指定hash值在对应table中的索引。  
    int i = indexFor(hash, table.length);  
    // 如果 i 索引处的 Entry 不为 null，通过循环不断遍历 e 元素的下一个元素。  
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {  
        Object k;  
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
            V oldValue = e.value;  
            e.value = value;  
            e.recordAccess(this);  
            return oldValue;  
        }  
    }  
    // 如果i索引处的Entry为null，表明此处还没有Entry。  
    modCount++;  
    // 将key、value添加到i索引处。  
    addEntry(hash, key, value, i);  
    return null;  
}

Copy after login

As can be seen from the above source code: when we put an element into the HashMap, we first recalculate the hash value based on the hashCode of the key, and get the element in the array based on the hash value. position (i.e. subscript), if there are other elements stored at this position in the array, then the elements at this position will be stored in the form of a linked list, with the newly added ones placed at the head of the chain, and the first added elements placed at the end of the chain. . If there is no element at that position in the array, the element is directly placed at that position in the array.
The addEntry(hash, key, value, i) method places the key-value pair at the i index of the array table based on the calculated hash value. addEntry is a package access method provided by HashMap. The code is as follows:

void addEntry(int hash, K key, V value, int bucketIndex) {  
    // 获取指定 bucketIndex 索引处的 Entry   
    Entry<K,V> e = table[bucketIndex];  
    // 将新创建的 Entry 放入 bucketIndex 索引处，并让新的 Entry 指向原来的 Entry  
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e);  
    // 如果 Map 中的 key-value 对的数量超过了极限  
    if (size++ >= threshold)  
    // 把 table 对象的长度扩充到原来的2倍。  
        resize(2 * table.length);  
}

Copy after login

When the system decides to store the key-value pair in HashMap, it does not consider the value in Entry at all, but only calculates and calculates based on the key. Determine the storage location of each Entry. We can completely regard the value in the Map collection as an accessory to the key. After the system determines the storage location of the key, the value can be stored there.

The hash(int h) method recalculates the hash based on the hashCode of the key. This algorithm adds high-bit calculation to prevent hash conflicts caused when the low-bit remains unchanged and the high-bit changes.

static int hash(int h) {  
    h ^= (h >>> 20) ^ (h >>> 12);  
    return h ^ (h >>> 7) ^ (h >>> 4);  
}

Copy after login

We can see that to find an element in a HashMap, we need to find the position in the corresponding array based on the hash value of the key. How to calculate this position is the hash algorithm. As mentioned before, the data structure of HashMap is a combination of array and linked list, so of course we hope that the element positions in this HashMap are distributed as evenly as possible, so that there is only one element in each position. Then when we use the hash algorithm to obtain At this position, we can immediately know that the element at the corresponding position is what we want, without having to traverse the linked list, which greatly optimizes the efficiency of the query.

For any given object, as long as its hashCode() return value is the same, the hash code value calculated by the program calling the hash(int h) method is always the same. The first thing we think of is to take the hash value modulo the array length, so that the distribution of elements is relatively even. However, the consumption of "modulo" operation is still relatively large. This is done in HashMap: call the indexFor(int h, int length) method to calculate which index of the table array the object should be saved at. The code of the indexFor(int h, int length) method is as follows:

static int indexFor(int h, int length) {  
    return h & (length-1);  
}

Copy after login

This method is very clever. It uses h & (table.length -1) to get the storage bit of the object, and the length of the underlying array of HashMap It is always 2 to the nth power. This is the speed optimization of HashMap. There is the following code in the HashMap constructor:

int capacity = 1;  
    while (capacity < initialCapacity)  
        capacity <<= 1;

Copy after login

This code ensures that the capacity of HashMap is always 2 to the nth power during initialization, that is, the length of the underlying array is always 2 to the nth power.
When length is always 2 to the nth power, the h& (length-1) operation is equivalent to taking the modulus of length, that is, h%length, but & is more efficient than %.
When the length of the array is 2 raised to the nth power, the probability of calculating the same index for different keys is small, so the data is distributed evenly on the array, which means that the probability of collision is small. Relatively speaking, the query At this time, there is no need to traverse the linked list at a certain position, so the query efficiency is higher.

As can be seen from the source code of the put method above, when the program attempts to put a key-value pair into a HashMap, the program first determines the storage location of the Entry based on the hashCode() return value of the key: If the hashCode() return values of two Entry keys are the same, then their storage locations are the same. If the keys of these two Entries return true through equals comparison, the value of the newly added Entry will overwrite the value of the original Entry in the collection, but the key will not be overwritten. If the keys of these two Entries return false through equals comparison, the newly added Entry will form an Entry chain with the original Entry in the collection, and the newly added Entry is located at the head of the Entry chain - continue to see the description of the addEntry() method for specific instructions. .
(2) Read

public V get(Object key) {  
    if (key == null)  
        return getForNullKey();  
    int hash = hash(key.hashCode());  
    for (Entry<K,V> e = table[indexFor(hash, table.length)];  
        e != null;  
        e = e.next) {  
        Object k;  
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))  
            return e.value;  
    }  
    return null;  
}

Copy after login

有了上面存储时的hash算法作为基础，理解起来这段代码就很容易了。从上面的源代码中可以看出：从HashMap中get元素时，首先计算key的hashCode，找到数组中对应位置的某一元素，然后通过key的equals方法在对应位置的链表中找到需要的元素。

3) 归纳起来简单地说，HashMap 在底层将 key-value 当成一个整体进行处理，这个整体就是一个 Entry 对象。HashMap 底层采用一个 Entry[] 数组来保存所有的 key-value 对，当需要存储一个 Entry 对象时，会根据hash算法来决定其在数组中的存储位置，在根据equals方法决定其在该数组位置上的链表中的存储位置；当需要取出一个Entry时，也会根据hash算法找到其在数组中的存储位置，再根据equals方法从该位置上的链表中取出该Entry。
4. HashMap的resize（rehash）：
当HashMap中的元素越来越多的时候，hash冲突的几率也就越来越高，因为数组的长度是固定的。所以为了提高查询的效率，就要对HashMap的数组进行扩容，数组扩容这个操作也会出现在ArrayList中，这是一个常用的操作，而在HashMap数组扩容之后，最消耗性能的点就出现了：原数组中的数据必须重新计算其在新数组中的位置，并放进去，这就是resize。

那么HashMap什么时候进行扩容呢？当HashMap中的元素个数超过数组大小*loadFactor时，就会进行数组扩容，loadFactor的默认值为0.75，这是一个折中的取值。也就是说，默认情况下，数组大小为16，那么当HashMap中元素个数超过16*0.75=12的时候，就把数组的大小扩展为 2*16=32，即扩大一倍，然后重新计算每个元素在数组中的位置，而这是一个非常消耗性能的操作，所以如果我们已经预知HashMap中元素的个数，那么预设元素的个数能够有效的提高HashMap的性能。
5. HashMap的性能参数：
HashMap 包含如下几个构造器：
HashMap()：构建一个初始容量为 16，负载因子为 0.75 的 HashMap。
HashMap(int initialCapacity)：构建一个初始容量为 initialCapacity，负载因子为 0.75 的 HashMap。
HashMap(int initialCapacity, float loadFactor)：以指定初始容量、指定的负载因子创建一个 HashMap。
HashMap的基础构造器HashMap(int initialCapacity, float loadFactor)带有两个参数，它们是初始容量initialCapacity和加载因子loadFactor。
initialCapacity：HashMap的最大容量，即为底层数组的长度。
loadFactor：负载因子loadFactor定义为：散列表的实际元素数目(n)/ 散列表的容量(m)。
负载因子衡量的是一个散列表的空间的使用程度，负载因子越大表示散列表的装填程度越高，反之愈小。对于使用链表法的散列表来说，查找一个元素的平均时间是O(1+a)，因此如果负载因子越大，对空间的利用更充分，然而后果是查找效率的降低；如果负载因子太小，那么散列表的数据将过于稀疏，对空间造成严重浪费。
HashMap的实现中，通过threshold字段来判断HashMap的最大容量：

threshold = (int)(capacity * loadFactor);

Copy after login

结合负载因子的定义公式可知，threshold就是在此loadFactor和capacity对应下允许的最大元素数目，超过这个数目就重新resize，以降低实际的负载因子。默认的的负载因子0.75是对空间和时间效率的一个平衡选择。当容量超出此最大容量时， resize后的HashMap容量是容量的两倍：

if (size++ >= threshold)     
    resize(2 * table.length);

Copy after login

6. Fail-Fast机制：
我们知道java.util.HashMap不是线程安全的，因此如果在使用迭代器的过程中有其他线程修改了map，那么将抛出ConcurrentModificationException，这就是所谓fail-fast策略。

这一策略在源码中的实现是通过modCount域，modCount顾名思义就是修改次数，对HashMap内容的修改都将增加这个值，那么在迭代器初始化过程中会将这个值赋给迭代器的expectedModCount。

HashIterator() {  
    expectedModCount = modCount;  
    if (size > 0) { // advance to first entry  
    Entry[] t = table;  
    while (index < t.length && (next = t[index++]) == null)  
        ;  
    }  
}

Copy after login

在迭代过程中，判断modCount跟expectedModCount是否相等，如果不相等就表示已经有其他线程修改了Map：
注意到modCount声明为volatile，保证线程之间修改的可见性。

final Entry<K,V> nextEntry() {     
    if (modCount != expectedModCount)     
        throw new ConcurrentModificationException();

Copy after login

在HashMap的API中指出：

The iterators returned by the "collection view methods" of all HashMap classes are fail-fast: after the iterator is created, if the map is structurally modified, unless it is passed through the remove method of the iterator itself, any other If the time is modified in any way, the iterator will throw ConcurrentModificationException. Therefore, in the face of concurrent modifications, the iterator will quickly fail completely without risking arbitrary non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of iterators cannot be guaranteed. Generally speaking, it is impossible to make any firm guarantees when there are asynchronous concurrent modifications. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it is a mistake to write a program that relies on this exception; the correct approach is that the fail-fast behavior of iterators should be used only to detect program errors.

Related recommendations:

In-depth understanding of the implementation principle of HashMap in java (picture)

Detailed explanation of the principle and implementation of java lock-free hashmap

The above is the detailed content of Analysis of the implementation principle of HashMap in Java. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

4 weeks ago By DDD

Atomfall guide: item locations, quest guides, and tips

1 months ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7691

Java Tutorial

1639

CakePHP Tutorial

1393

Laravel Tutorial

1287

PHP Tutorial

1229

Related knowledge

What is the expansion mechanism of hashmap? Mar 15, 2023 pm 03:39 PM

The expansion mechanism of hashmap is to recalculate the capacity and replace the original array with a new array. Recalculate all the data of the original array and insert a new array, and then point to the new array; if the array has reached the maximum value before capacity expansion, directly set the threshold to the maximum integer and return it.

How to insert key-value pairs into HashMap using put() method of HashMap class Jul 26, 2023 pm 11:53 PM

How to insert key-value pairs into a HashMap using the put() method of the HashMap class. HashMap is a very important class in the Java collection framework. It provides a way to store key-value pairs. In actual development, we often need to insert key-value pairs into HashMap, which can be easily achieved by using the put() method of the HashMap class. The signature of HashMap's put() method is as follows: Vput(Kkey,Vvalue)

Based on Java HashMap, how to solve the problem of inserting duplicate Key values May 09, 2023 am 10:52 AM

Inserting duplicate Key values in javaHashMap To insert duplicate values in HashMap, you first need to figure out how elements are stored in HashMap. Each element stored in the put method Map is a key-value pair, and they are all added through the put method, and the same key will only have one associated value in the Map. The put method is defined as follows in Map. Vput(Kkey,Vvalue); put() method implementation: first hash(key) gets the hashcode() of the key, and hashmap finds the chain where the position to be inserted is based on the obtained hashcode.

Interpretation of Java documentation: Detailed explanation of usage of containsKey() method of HashMap class Nov 04, 2023 am 08:12 AM

Interpretation of Java documentation: Detailed explanation of the usage of the containsKey() method of the HashMap class. Specific code examples are required. Introduction: HashMap is a commonly used data structure in Java. It provides efficient storage and search functions. The containsKey() method is used to determine whether the HashMap contains the specified key. This article will explain in detail how to use the containsKey() method of the HashMap class and provide specific code examples. 1. cont

What is the difference between LinkedHashMap and HashMap in java May 02, 2023 am 08:31 AM

1. Explain that Map can basically use HashMap, but HashMap has a problem, that is, the order of iterating HashMap is not the order in which HashMap is placed, or it is out of order. This shortcoming of HashMap often causes trouble, because in some scenarios we expect an ordered Map, which is LinkedHashMap. 2. Difference instances publicstaticvoidmain(String[]args){Mapmap=newLinkedHashMap();map.put("apple","Apple");map.put(&quot

Java adds a Map to another Map using the putAll() function of the HashMap class Jul 24, 2023 am 09:36 AM

Java uses the putAll() function of the HashMap class to add a Map to another Map. Map is a commonly used data structure in Java and is used to represent a collection of key-value pairs. In Java's collection framework, HashMap is a commonly used implementation class. It provides the putAll() function, which is used to add one Map to another Map to facilitate data merging and copying. This article will introduce how to use the putAll() function and provide corresponding code examples. first,

Java Map performance optimization revealed: Make your data operations faster and more efficient Feb 20, 2024 am 08:31 AM

JavaMap is a commonly used data structure in the Java standard library, which stores data in the form of key-value pairs. The performance of Map is crucial to the running efficiency of the application. If the performance of Map is poor, it may cause the application to run slowly or even crash. 1. Choose the appropriate Map implementation Java provides a variety of Map implementations, including HashMap, TreeMap and LinkedHashMap. Each Map implementation has its own advantages and disadvantages. When choosing a Map implementation, you need to choose the appropriate implementation based on the specific needs of the application. HashMap: HashMap is the most commonly used Map implementation. It uses hash tables to store data and has faster insertion, deletion and search speeds.

How to use HashMap to cache data in Java singleton mode May 13, 2023 am 09:43 AM

1. What is the singleton pattern? The singleton pattern is an object creation pattern that is used to generate a specific instance of an object. It can ensure that only one instance of a class in the system is generated. The singleton implemented in Java is within the scope of a virtual machine. Because the function of loading a class belongs to the virtual machine, a virtual machine will create an instance of the class when it loads the singleton class through its own ClassLoad. In the Java language, such behavior can bring two major benefits: 1. For frequently used objects, the time spent on creating objects can be omitted, which is a very considerable system overhead for those heavyweight objects; 2. Since the number of new operations is reduced, the frequency of system memory usage will also be reduced, which will reduce GC pressure.

See all articles