This article brings you a detailed introduction (code example) about the implementation of consistent hashing algorithm in PHP. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
1. Case Analysis
(1) Problem Overview
Assume that our image data is evenly distributed among three services (labeled as Server A and Server A respectively) B, server C) above, now we want to get the picture from it. After getting this request, how can the server specify whether this picture exists on server A, server B, or server C? If we want to traverse, two or three servers are okay, but that is too out. When the number of servers reaches hundreds or thousands, do we still dare to talk about traversing?
(2) Solution
a. Through storage mapping relationship
First of all, we may think that we can create an intermediate layer to record image storage On which server, as follows:
logo1.png =====》 Service A
ogo2.png =====》 Service B
logo3.png == ===》Service C
In this way, whenever a request comes, we first request the mapping relationship between the image and the server, find the server where the image is stored, and then make a request to the specified server. From an implementation point of view, this is feasible, but when storing pictures, we must also store the mapping relationship between the pictures and the server, which obviously increases the workload, and its maintenance is also a problem. Once the stored pictures and the server If there is a problem with the mapping relationship, the entire system will hang up.
b. Hash algorithm
Since we want to exclude the storage mapping relationship, at this time, people think of the hash algorithm. For example, when the
picture is stored, the hash value val is obtained through the hash algorithm based on the picture name (logo1.png), and the hash value val is obtained by taking the modulo of val. value, you can determine which server the image should be stored on. As follows:
key = hash(imgName) % n
Where:
imgName is the name of the image,
n is the number of servers,
key represents the number where the image should be stored on the server.
When a request comes, such as requesting the image logo1.png, the server can determine which server the logo1.png is stored on based on the key calculated by the above formula. The PHP implementation is as follows:
$hostsMap = ['img1.findme.wang', 'img2.findme.wang', 'img3.findme.wang']; function getImgSrc($imgName) { global $hostsMap; $key = crc32($imgName) % count($hostsMap); return 'http://' . $hostsMap[abs($key)] . '/' . $imgName; } //测试 var_dump(getImgSrc("logo1.png")); var_dump(getImgSrc("logo2.png")); var_dump(getImgSrc("logo3.png"));
Output:
At this point, we change from storing the mapping relationship to calculating the serial number of the server, which indeed greatly simplifies the work. quantity.
But once a new machine is added, it is very troublesome, because n changes, and almost all the serial number keys also change, so a lot of data migration work is required.
C. Consistent hash algorithm
Consistent hash algorithm is a special hash algorithm designed to solve the problem when the number of nodes (such as the number of servers storing pictures) ) changes, try to migrate as little data as possible.
The basic idea:
1. First, evenly distribute the points from 0 to 2 to the power of 32 on a ring, as follows:
2. Then calculate all the node nodes (servers that store pictures) through hash calculation, take the remainder of 232, and then map it to the hash ring, as follows:
3. When a request comes, for example, requesting the image logo1.png, after hash calculation, the remainder of 232 is taken, and then also mapped to the hash ring, as follows:
4. Then turn clockwise. The first node reached is considered to be the server that stores the logo1.png image.
As can be seen from the above, in fact, the highlight of consistent hashing is that the first is the hash calculation and mapping of both the node node (the server that stores the picture) and the object (picture), and the second is the closed-loop design.
Advantages: When a new machine is added, only the marked area is affected, as shown below:
Disadvantages: When there are relatively few nodes , often lacks balance, because after hash calculation, the nodes mapped to the hash ring are not evenly distributed, resulting in some machines having a high load and some machines being idle.
PHP implementation is as follows:
$hostsMap = ['img1.findme.wang', 'img2.findme.wang', 'img3.findme.wang']; $hashRing = []; function getImgSrc($imgName){ global $hostsMap; global $hashRing; //将节点映射到hash环上面 if (empty($hashRing)) { foreach($hostsMap as $h) { $hostKey = fmod(crc32($h) , pow(2,32)); $hostKey = abs($hostKey); $hashRing[$hostKey] = $h; } //从小到大排序,便于查找 ksort($hashRing); } //计算图片hash $imgKey = fmod(crc32($imgName) , pow(2,32)); $imgKey = abs($imgKey); foreach($hashRing as $hostKey => $h) { if ($imgKey <p>The output result is as follows: </p><p><img src="https://img.php.cn/upload/image/534/248/963/1551677736513489.png" title="1551677736513489.png" alt="Detailed introduction to implementing consistent hashing algorithm in PHP (code example)"></p><p>至于为什么使用fmod函数不适用求余运算符%,主要是因为pow(2,32)在32位操作系统上面,超高了PHP_INT_MAX,具体可以参考上一篇文章“PHP中对大数求余报错Uncaught pisionByZeroError: Modulo by zero”。</p><p>d、通过虚拟节点优化一致性hash算法</p><p>为了提高一致性hash算法的平衡性,我们首先能够想到的是,增加节点数,但是机器毕竟是需要经费啊,不是说增就能随意增,那就增加虚拟节点,这样就没毛病了。思路如下:</p><p>1、假设host1、host2、host3,都分别有3个虚拟节点,如host1的虚拟节点为host1_1、host1_2、host1_3</p><p>2、然后将所有的虚拟节点node(存储图片的服务器)通过hash计算后,对232取余,然后也映射到hash环上面,如下:</p><p><img src="https://img.php.cn/upload/image/891/291/577/1551677752319649.png" title="1551677752319649.png" alt="Detailed introduction to implementing consistent hashing algorithm in PHP (code example)"></p><p>然后,接下来步骤同一致性hash算法一致,只是最后需要将虚拟节点,转为真实的节点。</p><p>PHP实现如下:</p><pre class="brush:php;toolbar:false">$hostsMap = ['img1.findme.wang', 'img2.findme.wang', 'img3.findme.wang']; $hashRing = []; function getImgSrc($imgName){ global $hostsMap; global $hashRing; $virtualNodeLen = 3; //每个节点的虚拟节点个数 //将节点映射到hash环上面 if (empty($hashRing)) { foreach($hostsMap as $h) { $i = 0; while($i $h) { if ($imgKey <p>执行结果如下:</p><p><img src="https://img.php.cn/upload/image/947/947/423/1551677769758146.png" title="1551677769758146.png" alt="Detailed introduction to implementing consistent hashing algorithm in PHP (code example)"></p><p><strong>二、备注</strong><br>1、取模与取余的区别?</p><p>取余,遵循尽可能让商向0靠近的原则</p><p>取模,遵循尽可能让商向负无穷靠近的原则</p><p>1、什么是CRC算法?</p><p>CRC(Cyclical Redundancy Check)即循环冗余码校验,主要用于数据校验,常用的有CRC16、CRC32,其中16、32代表多项式最高次幂。</p><p></p>
The above is the detailed content of Detailed introduction to implementing consistent hashing algorithm in PHP (code example). For more information, please follow other related articles on the PHP Chinese website!