This article mainly explains the principle of the consistent hash algorithm and its existing data skew problem, then introduces methods to solve the data skew problem, and finally analyzes the use of the consistent hash algorithm in Dubbo. This article introduces how the consistent hashing algorithm works, as well as the problems and solutions faced by the algorithm.
Here is a quote from dubbo’s official website——
LoadBalance in Chinese means load balancing. Its responsibility is to transfer network requests, or other forms of The load is "evenly distributed" to different machines. Avoid the situation where some servers in the cluster are under excessive pressure while other servers are relatively idle. Through load balancing, each server can obtain a load suitable for its own processing capabilities. While offloading high-load servers, you can also avoid wasting resources, killing two birds with one stone. Load balancing can be divided into software load balancing and hardware load balancing. In our daily development, it is generally difficult to access hardware load balancing. But software load balancing is still available, such as Nginx. In Dubbo, there is also the concept of load balancing and corresponding implementation.
In order to avoid excessive load on certain service providers, Dubbo needs to allocate call requests from service consumers. The service provider's load is too large, which may cause some requests to time out. Therefore, it is very necessary to balance the load to each service provider. Dubbo provides 4 load balancing implementations, namely RandomLoadBalance based on weighted random algorithm, LeastActiveLoadBalance based on least active calls algorithm, ConsistentHashLoadBalance based on hash consistency, and RoundRobinLoadBalance based on weighted polling algorithm. Although the codes of these load balancing algorithms are not very lengthy, understanding their principles requires certain professional knowledge and understanding.
** The consistent hash algorithm was proposed by Karger and his collaborators at MIT in 1997. The algorithm was originally proposed for large-scale cache systems. load balancing. ** Its working process is as follows. First, a hash is generated for the cache node based on IP or other information, and this hash is projected onto the ring of [0, 232 - 1]. Whenever there is a query or write request, a hash value is generated for the cache item's key. Next, you need to find the first node in the cache node with a hash value greater than or equal to the given value, and perform a cache query or write operation on this node. When the current node expires, on the next cache query or write, another cache node with a hash greater than the current cache entry can be looked up. The general effect is as shown in the figure below. Each cache node occupies a position on the ring. When the hash value of the cache item's key is less than the hash value of the cache node, the cache item will be stored or read from the cache node. The cache item corresponding to the green mark below will be stored in node cache-2. The cache items were originally supposed to be stored in the cache-3 node, but due to the downtime of this node, they were eventually stored in the cache-4 node.
Figure 4 Consistent Hash Algorithm
In the consistent hash algorithm, whether it is adding nodes or downtime nodes, the affected interval It only increases or crashes the interval between the first server encountered in the counterclockwise direction in the hash ring space, and other intervals will not be affected.
But consistent hashing also has problems:
Figure 5 Data skew
May occur when there are few nodes In this distribution situation, service A will bear most of the requests. This situation is called data skew.
So how to solve the problem of data skew?
Join virtual node.
First of all, a server can have multiple virtual nodes as needed. Suppose a server has n virtual nodes. In a hash calculation, a combination of IP address, port number, and number can be used to calculate the hash value. The number is a number from 0 to n. These n nodes all point to the same machine because they share the same IP address and port number.
Figure 6 Introducing a virtual node
Before joining the virtual node, server A shouldered the vast majority of requests. If each server has a virtual node (A-1, B-1, C-1) and is assigned to the location shown in the figure above through hash calculation. Then the requests undertaken by server A are allocated to the B-1 and C-1 virtual nodes to a certain extent (the part marked with a five-pointed star in the figure), which is actually allocated to the B and C servers.
In the consistent hash algorithm, adding virtual nodes can solve the problem of data skew.
Figure 7 Consistent hashing ring in Dubbo
The same color here The nodes all belong to the same service provider, such as Invoker1-1, Invoker1-2,..., Invoker1-160. By adding virtual nodes, we can spread the Invoker over the hash ring, thus avoiding the data skew problem. The so-called data skew refers to the situation where a large number of requests fall on the same node because the nodes are not dispersed enough, while other nodes only receive a small number of requests. For example:
Figure 8 Data skew problem
As above, due to the uneven distribution of Invoker-1 and Invoker-2 on the ring, 75 in the system % of the requests will fall on Invoker-1, and only 25% of the requests will fall on Invoker-2. In order to solve this problem, virtual nodes can be introduced to balance the request volume of each node.
Now that the background knowledge has been popularized, let’s start analyzing the source code. Let's start with the doSelect method of ConsistentHashLoadBalance, as follows:
public class ConsistentHashLoadBalance extends AbstractLoadBalance {private final ConcurrentMap<String, ConsistentHashSelector<?>> selectors = new ConcurrentHashMap<String, ConsistentHashSelector<?>>();@Overrideprotected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {String methodName = RpcUtils.getMethodName(invocation);String key = invokers.get(0).getUrl().getServiceKey() + "." + methodName;// 获取 invokers 原始的 hashcodeint identityHashCode = System.identityHashCode(invokers);ConsistentHashSelector<T> selector = (ConsistentHashSelector<T>) selectors.get(key);// 如果 invokers 是一个新的 List 对象,意味着服务提供者数量发生了变化,可能新增也可能减少了。// 此时 selector.identityHashCode != identityHashCode 条件成立if (selector == null || selector.identityHashCode != identityHashCode) {// 创建新的 ConsistentHashSelectorselectors.put(key, new ConsistentHashSelector<T>(invokers, methodName, identityHashCode));selector = (ConsistentHashSelector<T>) selectors.get(key);}// 调用 ConsistentHashSelector 的 select 方法选择 Invokerreturn selector.select(invocation);}private static final class ConsistentHashSelector<T> {...}}
As above, the doSelect method mainly does some preparatory work, such as detecting whether the invokers list has changed, and creating a ConsistentHashSelector. After these tasks are completed, the select method of ConsistentHashSelector is called to execute the load balancing logic. Before analyzing the select method, let's first take a look at the initialization process of the consistent hash selector ConsistentHashSelector, as follows:
private static final class ConsistentHashSelector<T> {// 使用 TreeMap 存储 Invoker 虚拟节点private final TreeMap<Long, Invoker<T>> virtualInvokers;private final int replicaNumber;private final int identityHashCode;private final int[] argumentIndex;ConsistentHashSelector(List<Invoker<T>> invokers, String methodName, int identityHashCode) {this.virtualInvokers = new TreeMap<Long, Invoker<T>>();this.identityHashCode = identityHashCode;URL url = invokers.get(0).getUrl();// 获取虚拟节点数,默认为160this.replicaNumber = url.getMethodParameter(methodName, "hash.nodes", 160);// 获取参与 hash 计算的参数下标值,默认对第一个参数进行 hash 运算String[] index = Constants.COMMA_SPLIT_PATTERN.split(url.getMethodParameter(methodName, "hash.arguments", "0"));argumentIndex = new int[index.length];for (int i = 0; i < index.length; i++) {argumentIndex[i] = Integer.parseInt(index[i]);}for (Invoker<T> invoker : invokers) {String address = invoker.getUrl().getAddress();for (int i = 0; i < replicaNumber / 4; i++) {// 对 address + i 进行 md5 运算,得到一个长度为16的字节数组byte[] digest = md5(address + i);// 对 digest 部分字节进行4次 hash 运算,得到四个不同的 long 型正整数for (int h = 0; h < 4; h++) {// h = 0 时,取 digest 中下标为 0 ~ 3 的4个字节进行位运算// h = 1 时,取 digest 中下标为 4 ~ 7 的4个字节进行位运算// h = 2, h = 3 时过程同上long m = hash(digest, h);// 将 hash 到 invoker 的映射关系存储到 virtualInvokers 中,// virtualInvokers 需要提供高效的查询操作,因此选用 TreeMap 作为存储结构virtualInvokers.put(m, invoker);}}}}}
ConsistentHashSelector 的构造方法执行了一系列的初始化逻辑,比如从配置中获取虚拟节点数以及参与 hash 计算的参数下标,默认情况下只使用第一个参数进行 hash。需要特别说明的是,ConsistentHashLoadBalance 的负载均衡逻辑只受参数值影响,具有相同参数值的请求将会被分配给同一个服务提供者。注意到 ConsistentHashLoadBalance 无需考虑权重的影响。
在获取虚拟节点数和参数下标配置后,接下来要做的事情是计算虚拟节点 hash 值,并将虚拟节点存储到 TreeMap 中。ConsistentHashSelector的初始化工作在此完成。接下来,我们来看看 select 方法的逻辑。
public Invoker<T> select(Invocation invocation) {// 将参数转为 keyString key = toKey(invocation.getArguments());// 对参数 key 进行 md5 运算byte[] digest = md5(key);// 取 digest 数组的前四个字节进行 hash 运算,再将 hash 值传给 selectForKey 方法,// 寻找合适的 Invokerreturn selectForKey(hash(digest, 0));}private Invoker<T> selectForKey(long hash) {// 到 TreeMap 中查找第一个节点值大于或等于当前 hash 的 InvokerMap.Entry<Long, Invoker<T>> entry = virtualInvokers.tailMap(hash, true).firstEntry();// 如果 hash 大于 Invoker 在圆环上最大的位置,此时 entry = null,// 需要将 TreeMap 的头节点赋值给 entryif (entry == null) {entry = virtualInvokers.firstEntry();}// 返回 Invokerreturn entry.getValue();}
如上,选择的过程相对比较简单了。首先,需要对参数进行 md5 和 hash 运算,以生成一个哈希值。接着使用这个值去 TreeMap 中查找需要的 Invoker。
到此关于 ConsistentHashLoadBalance 就分析完了。
在阅读 ConsistentHashLoadBalance 源码之前,建议读者先补充背景知识,不然看懂代码逻辑会有很大难度。
作者:京东物流 乔杰
来源:京东云开发者社区
The above is the detailed content of Dubbo load balancing strategy consistent hashing. For more information, please follow other related articles on the PHP Chinese website!