When applying HashMap in java multi-threaded environment, there are mainly the following options: Use the thread-safe java.util.Hashtable as an alternative and use the java.util.Collections.synchronizedMap method to wrap the existing HashMap object as thread-safe . Use the java.util.concurrent.ConcurrentHashMap class as an alternative, it has very good performance.
The above methods all use mutex locks more or less in the specific details of implementation. Mutex locks will cause thread blocking, reduce operating efficiency, and may cause a series of problems such as deadlock and priority flipping.
CAS (Compare And Swap) is a function provided by the underlying hardware, which can atomicize the operation of judging and changing a value.
Atomic operations in Java
In the java.util.concurrent.atomic package, Java provides us with many convenient atomic types, which are entirely based on CAS operations at the bottom.
For example, if we want to implement a global public counter, we can:
privateAtomicInteger counter =newAtomicInteger(3); publicvoidaddCounter() { for(;;) { intoldValue = counter.get(); intnewValue = oldValue +1; if(counter.compareAndSet(oldValue, newValue)) return; } }
Among them, the compareAndSet method will check whether the existing value of counter is oldValue, and if so, set it to new If the value is newValue, the operation is successful and true is returned; otherwise the operation fails and false is returned.
When calculating the new value of counter, if other threads change the value of counter, compareAndSwap will fail. At this point we only need to add a layer of looping outside and keep trying this process, then we will eventually successfully increase the counter value by 1. (In fact, AtomicInteger has already defined the incrementAndGet and decrementAndGet methods for the commonly used +1/-1 operations. We only need to simply call it in the future)
In addition to AtomicInteger, the java.util.concurrent.atomic package also provides The AtomicReference and AtomicReferenceArray types are introduced, which represent atomic references and atomic reference arrays (arrays of references) respectively.
Implementation of lock-free linked list
Before implementing lock-free HashMap, let us first take a look at the relatively simple implementation method of lock-free linked list.
Take the insertion operation as an example:
First we need to find the node A in front of the position to be inserted and the node B behind it.
Then create a new node C and make its next pointer point to node B. (See Figure 1)
Finally make the next pointer of node A point to node C. (See Figure 2)
But in the middle of the operation, it is possible that other threads directly inserted some nodes in A and B (assumed to be D). If we do not do anything Judgment, it may cause the loss of nodes inserted by other threads. (See Figure 3) We can use the CAS operation to determine whether the next pointer of node A still points to B when assigning a value. If the next pointer of node A changes, retry the entire insertion operation. The approximate code is as follows:
privatevoidlistInsert(Node head, Node c) { for(;;) { Node a = findInsertionPlace(head), b = a.next.get(); c.next.set(b); if(a.next.compareAndSwap(b,c)) return; } }
(The next field of the Node class is of type AtomicReference
The search operation of a lock-free linked list is no different from that of an ordinary linked list. . The deletion operation requires finding the node A in front of the node to be deleted and the node B behind it, and using the CAS operation to verify and update the next pointer of node A so that it points to node B.
Difficulties and breakthroughs in lock-free HashMap
HashMap mainly has four basic operations: insertion, deletion, search and ReHash. A typical HashMap implementation uses an array, and each element of the array is a linked list of nodes. For this linked list, we can use the operation methods mentioned above to perform insertion, deletion and search operations, but the ReHash operation is more difficult.
As shown in Figure 4, during the ReHash process, a typical operation is to traverse each node in the old table, calculate its position in the new table, and then add it Move to new table. During this period, we need to manipulate pointers three times:
Point A's next pointer to D
Point B's next pointer to C
Point C's next pointer to E
These three pointer operations must be completed at the same time. Only in this way can the atomicity of the move operation be guaranteed. But it is not difficult to see that the CAS operation can only ensure that the value of one variable is atomically verified and updated at a time, and cannot meet the need to verify and update three pointers at the same time.
So we might as well change our thinking. Since the operation of moving nodes is so difficult, we can keep all nodes in an ordered state at all times, thus avoiding the moving operation. In a typical HashMap implementation, the length of the array always remains 2i, and the process of mapping the Hash value to the array subscript simply performs a modulo operation on the array length (that is, only the last i bits of the Hash binary are retained). When ReHash, the array length is doubled to 2i+1, and each node in the j-th necklace list of the old array is either moved to the j-th item in the new array, or moved to the j+2i-th item in the new array, and their The only difference lies in the i+1th bit of the Hash value (if the i+1th bit is 0, it is still the jth item, otherwise it is the j+2ith item).
如图5,我们将所有节点按照Hash值的翻转位序(如1101->1011)由小到大排列。当数组大小为8时,2、18在一个组内;3、 11、27在另一个组内。每组的开始,插入一个哨兵节点,以方便后续操作。为了使哨兵节点正确排在组的最前方,我们将正常节点Hash的最高位(翻转后变为最低位)置为1,而哨兵节点不设置这一位。
当数组扩容至16时(见图6),第二组分裂为一个只含3的组和一个含有11、27的组,但节点之间的相对顺序并未改变。这样在ReHash时,我们就不需要移动节点了。
实现细节
由于扩容时数组的复制会占用大量的时间,这里我们采用了将整个数组分块,懒惰建立的方法。这样,当访问到某下标时,仅需判断此下标所在块是否已建立完毕(如果没有则建立)。
另外定义size为当前已使用的下标范围,其初始值为2,数组扩容时仅需将size加倍即可;定义count代表目前HashMap中包含的总节点个数(不算哨兵节点)。
初始时,数组中除第0项外,所有项都为null。第0项指向一个仅有一个哨兵节点的链表,代表整条链的起点。初始时全貌见图7,其中浅绿色代表当前未使用的下标范围,虚线箭头代表逻辑上存在,但实际未建立的块。
初始化下标操作
数组中为null的项都认为处于未初始化状态,初始化某个下标即代表建立其对应的哨兵节点。初始化是递归进行的,即若其父下标未初始化,则先初始化其父下标。(一个下标的父下标是其移除最高二进制位后得到的下标)大致代码如下:
privatevoidinitializeBucket(intbucketIdx) { intparentIdx = bucketIdx ^ Integer.highestOneBit(bucketIdx); if(getBucket(parentIdx) ==null) initializeBucket(parentIdx); Node dummy =newNode(); dummy.hash = Integer.reverse(bucketIdx); dummy.next =newAtomicReference<>(); setBucket(bucketIdx, listInsert(getBucket(parentIdx), dummy)); }
其中getBucket即封装过的获取数组某下标内容的方法,setBucket同理。listInsert将从指定位置开始查找适合插入的位置插入给定的节点,若链表中已存在hash相同的节点则返回那个已存在的节点;否则返回新插入的节点。
插入操作
首先用HashMap的size对键的hashCode取模,得到应插入的数组下标。
然后判断该下标处是否为null,如果为null则初始化此下标。
构造一个新的节点,并插入到适当位置,注意节点中的hash值应为原hashCode经过位翻转并将最低位置1之后的值。
将节点个数计数器加1,若加1后节点过多,则仅需将size改为size*2,代表对数组扩容(ReHash)。
查找操作
找出待查找节点在数组中的下标。
判断该下标处是否为null,如果为null则返回查找失败。
从相应位置进入链表,顺次寻找,直至找出待查找节点或超出本组节点范围。
删除操作
找出应删除节点在数组中的下标。
判断该下标处是否为null,如果为null则初始化此下标。
找到待删除节点,并从链表中删除。(注意由于哨兵节点的存在,任何正常元素只被其唯一的前驱节点所引用,不存在被前驱节点与数组中指针同时引用的情况,从而不会出现需要同时修改多个指针的情况)
将节点个数计数器减1。
更多Detailed explanation of the principle and implementation of java lock-free hashmap相关文章请关注PHP中文网!