What is consistent hashing-Common Problem-php.cn

What is consistent hashing

hzc

Release： 2020-06-29 13:14:54

Original

2712 people have browsed it

The consistent hash algorithm was proposed by MIT in 1997. It is a special hash algorithm that aims to solve the problem of distributed caching. When removing or adding a server, it can It is possible to slightly change the mapping between existing service requests and the servers that handle them.

What is consistent hashing

The consistent hash algorithm was proposed by MIT in 1997. It is a special hash algorithm designed to solve the problem of distributed cache. . When removing or adding a server, the mapping between existing service requests and the servers handling the requests can be changed as little as possible. Consistent hashing solves the problems of dynamic scaling of simple hashing algorithms in distributed hash tables (DHT).

Introduction:

The consistent hash algorithm was proposed in the paper Consistenthashing and random trees in 1997 and is widely used in distributed systems. Consistent hashing is a hashing algorithm. Simply put, when a server is removed or added, this algorithm can change the mapping relationship between existing service requests and the request-processing server as little as possible, and satisfy monotony as much as possible. sexual requirements. In an ordinary distributed cluster, there is a one-to-one correspondence between service requests and processing request servers. That is to say, the mapping relationship between service requests and processing servers is fixed, and a certain request is processed by a fixed server. This method cannot load balance the entire system and may cause some servers to be too busy to handle new requests. While other servers are too idle, the resource utilization of the overall system is low, and when a server in the distributed cluster goes down, it will directly cause certain service requests to be unable to be processed.

Further improvement can use the hash algorithm to map the relationship between service requests and processing servers to achieve the purpose of dynamic allocation. The service request is converted through the hash algorithm, and the converted result is modulo calculated on the server node value. The modulo value is the request processing server corresponding to the service request. This method can cope with node failure. When a distributed cluster node goes down, service requests can be redistributed to other available servers through the hash algorithm. This avoids the situation where the request cannot be processed.

However, the shortcomings of this method are also obvious. If the data corresponding to the service request is stored in the server, then if the hash value of the request is recalculated, a large number of requests will be redirected to different servers. The data to be used in the request is invalid, which is very bad in a distributed system. A well-designed distributed system should have good monotonicity, that is, the addition and removal of servers will not cause a large number of hash relocations, and consistent hashing can exactly solve this problem.

The consistent hash algorithm maps the entire hash value space into a virtual ring, and the value range of the entire hash space is 0~232-1. The entire space is organized in a clockwise direction. The directions of 0~232-1 coincide at the zero point. Next, use the following algorithm to map the service request, use the hash algorithm to calculate the corresponding hash value of the service request, and then search clockwise along the circle according to the position of the hash value. The first server encountered is the corresponding processing request. server. When a new server is added, the affected data is only the data between the newly added server and the previous server in its ring space (that is, the first server encountered in the counterclockwise direction). The other data are will not be affected. To sum up, the consistent hashing algorithm only needs to relocate a small part of the data in the ring space for the increase or decrease of nodes, and has good fault tolerance and scalability.

Features:

Scalability. The consistent hash algorithm ensures minimal changes in data storage when adding or removing servers, greatly saving data movement overhead compared to traditional hash algorithms.
Better adapt to the rapid growth of data. Consistent hashing algorithm is used to distribute data. When the data continues to grow, some virtual nodes may contain a lot of data, causing the data to be unevenly distributed on the virtual nodes. At this time, the virtual nodes containing a lot of data can be split. This split is only The original virtual node is divided into two, and there is no need to re-hash and divide all the data. After the virtual nodes are split, if the load on the physical servers is still unbalanced, you only need to adjust the storage distribution of some virtual nodes among the servers. This can dynamically expand the number of physical servers as data grows, and the cost is much lower than redistributing all data with traditional hashing algorithms.

The above is the detailed content of What is consistent hashing. For more information, please follow other related articles on the PHP Chinese website!