This article will introduce you to the relevant knowledge of Redis and take you through master-slave replication, Sentinel, and clustering, so as to take your Redis level to a higher level!

Take you through master-slave replication, Sentinel, and clustering in Redis

1. Master-slave replication

1. Introduction

Master-slave replication is the cornerstone of Redis distribution and the high availability of Redis Assure. In Redis, the server being replicated is called the master server (Master), and the server replicating the master server is called the slave server (Slave). [Related recommendations: Redis Video Tutorial]

Take you through master-slave replication, Sentinel, and clustering in Redis

The configuration of master-slave replication is very simple, and there are three ways (including IP-master server IP address/PORT -Main server Redis service port):

Configuration file - redis.conf file, configure slaveof ip port
command - enter Redis client executes slaveof ip port
Startup parameters——./redis-server --slaveof ip port

2. Master-slave replication The evolution

The master-slave replication mechanism of Redis was not as perfect as the 6.x version at the beginning, but it was iterated from version to version. It has generally gone through three versions of iteration:

Before 2.8
##2.8~4.0
After 4.0

As the version grows, the Redis master-slave replication mechanism gradually improves; but their essence revolves around the two operations of synchronization (sync) and command propagate (command propagate) Expand:

Synchronization (sync): refers to updating the data status of the slave server to the current data status of the main server, which mainly occurs during initialization or subsequent full synchronization.
Command propagate: When the data status of the master server is modified (write/delete, etc.) and the data status between the master and slave is inconsistent, the master service will change the data. The command is propagated to the slave server to bring the status between the master and slave servers back to consistency.

2.1 Before version 2.8

2.1.1 Synchronization

In versions before 2.8, synchronization from the slave server to the master server requires the slave server to the master server A sync command occurs to complete:

Take you through master-slave replication, Sentinel, and clustering in Redis

The slave server receives the slaveof ip prot command sent by the client, and the slave server creates it to the master server based on ip:port Socket connection
After the socket is successfully connected to the main server, the slave server will associate a file event handler specifically used to handle replication work with the socket connection. Subsequent RDB files and propagated commands sent by the master server
start to be copied, and the slave server sends sync commands to the master server
master server After receiving the sync command, execute the bgsave command. The sub-process of the main process fork of the main server will generate an RDB file and record all write operations after the RDB snapshot is generated in the buffer.

When the synchronization work is completed, the master-slave replication It is necessary to maintain the consistency of data status through command propagation. As shown in the figure below, after the synchronization work between the current master and slave servers is completed, the master service deletes K6 after receiving the DEL K6 instruction from the client. At this time, K6 still exists on the slave server, and the master-slave data status is inconsistent. In order to maintain the consistent status of the master and slave servers, the master server will propagate commands that cause its own data status to change to the slave server for execution. When the slave server also executes the same command, the data status between the master and slave servers will remain consistent.

Take you through master-slave replication, Sentinel, and clustering in Redis 2.1.3 Defects

From the above, we can’t see any flaws in the master-slave replication of versions before 2.8. This is because we have not considered network fluctuations. Brothers who understand distribution must have heard of CAP theory. CAP theory is the cornerstone of distributed storage systems. In CAP theory, P (partition network partition) must exist, and Redis master-slave replication is no exception. When a network failure occurs between the master and slave servers, resulting in the failure of communication between the slave server and the master server for a period of time. When the slave server reconnects to the master server, if the data status of the master server changes during this period, then the master-slave server Inconsistencies in data status will occur between servers. In master-slave replication versions before Redis 2.8, the way to solve this data state inconsistency is to resend the sync command. Although sync can ensure that the data status of the master and slave servers is consistent, it is obvious that sync is a very resource-consuming operation.

When the sync command is executed, the resources required by the master and slave servers:

The master server executes BGSAVE to generate RDB files, which will occupy a large amount of CPU, disk I/O and memory resources.
The master server sends the generated RDB file to the slave server, which will occupy a lot of network bandwidth.
The slave server receives the RDB file and loads it , will cause the slave server to be blocked and unable to provide services

As can be seen from the above three points, the sync command will not only cause the response ability of the master server to decrease, but also cause the slave server to Refuse to provide services to outsiders.

2.2 Version 2.8-4.0

2.2.1 Improvement points

For versions before 2.8, Redis will reconnect to the slave server after 2.8 Data status synchronization has been improved. The direction of improvement is to reduce the occurrence of full resynchronization and use partial resynchronization as much as possible. After version 2.8, the psync command is used instead of the sync command to perform synchronization operations. The psync command has both full synchronization and incremental synchronization functions:

Full synchronization with the previous version (sync) Consistent
In incremental synchronization, different measures will be taken according to the situation for replication after disconnection and reconnection; if conditions permit, only part of the data missing from the service will still be sent.

2.2.2 How to implement psync

In order to achieve incremental synchronization after disconnection and reconnection from the server, Redis adds three auxiliary parameters:

Replication offset
Replication backlog
Server running id (run id)

2.2.2.1 Replication offset

A replication offset will be maintained in both the master server and the slave server

The master server sends data to the slave service, propagating N bytes of data, and the replication offset of the master service is increased by N
from the slave server Receive the data sent by the master server, receive N bytes of data, and increase the replication offset of the slave server by N

The normal synchronization situation is as follows:

Take you through master-slave replication, Sentinel, and clustering in Redis

By comparing whether the replication offsets between the master and slave servers are equal, you can know whether the data status between the master and slave servers is consistent. Assuming that A/B propagates normally and C is disconnected from the server, the following situation will occur:

Take you through master-slave replication, Sentinel, and clustering in Redis

It is obvious that there is a copy offset Later, after the slave server C is disconnected and reconnected, the master server only needs to send the 100 bytes of data missing from the slave server. But how does the master server know what data is missing from the slave server?

2.2.2.2 Copy backlog buffer

The copy backlog buffer is a fixed-length queue with a default size of 1MB. When the data status of the master server changes, the master server synchronizes the data to the slave server and saves a copy to the replication backlog buffer.

Take you through master-slave replication, Sentinel, and clustering in Redis

In order to match the offset, the copy backlog buffer not only stores the data content, but also records the offset corresponding to each byte:

复制积压缓冲区+字节值+Take you through master-slave replication, Sentinel, and clustering in Redis

When the slave server is disconnected and reconnected, the slave server sends its replication offset (offset) to the master server through the psync command, and the master server can use this offset Use the amount to determine whether to perform incremental propagation or full synchronization.

If the data at offset 1 is still in the copy backlog buffer, then perform incremental synchronization operation
Otherwise, perform full synchronization Operation, consistent with sync

The default copy backlog buffer size of Redis is 1MB. How to set it if you need to customize it? Obviously, we want to use incremental synchronization as much as possible, but we don't want the buffer to occupy too much memory space. Then we can set the size of the replication backlog buffer S by estimating the reconnection time T after the Redis slave service is disconnected and the memory size M of the write commands received by the Redis master server per second.

S = 2 * M * T

Note that the expansion of 2 times here is to leave a certain amount of room to ensure that most of the breaks Incremental synchronization can be used for line reconnection.

2.2.2.3 Server running ID

After seeing this, do you think that the incremental synchronization of disconnection and reconnection can already be achieved, and you also need to run the ID dry Well? In fact, there is another situation that has not been considered, that is, when the master server goes down, a slave server is elected as the new master server. In this case, we can distinguish it by comparing the running ID.

The run ID (run id) is 40 random hexadecimal strings automatically generated when the server starts. Both the master service and the slave server will generate run IDs
When the slave server synchronizes the data of the master server for the first time, the master server will send its running ID to the slave server, and the slave server will save it in the RDB file
When the slave server is disconnected and reconnected, the slave server will send the previously saved master server running ID to the master server. If the server running ID matches, it proves that the master server has not changed, and you can try incremental synchronization
If the server running ID does not match, full synchronization will be performed

2.2.3 Complete psync

The complete psync process is very complicated. It has been very perfect in the master-slave replication version of 2.8-4.0. The parameters sent by the psync command are as follows:

psync

When the slave server has not replicated any master server (it is not the first time that the master-slave replicated , because the master server may change, but the slave server is fully synchronized for the first time), the slave server will send:

psync ? -1

Take you through master-slave replication, Sentinel, and clustering in Redis

The complete psync process is as follows:

一次完整的Take you through master-slave replication, Sentinel, and clustering in Redis

Received from the server Go to SLAVEOF 127.0.0.1 6379 command
Return OK from the server to the command initiator (this is an asynchronous operation, return OK first, and then save the address and port information)
The slave server saves the IP address and port information to the Master Host and Master Port
The slave server actively initiates a socket to the master server based on the Master Host and Master Port connection, and at the same time, the slave service will associate a file event handler specifically used for file copying with this socket connection for subsequent RDB file copying and other work
Master server After receiving the socket connection request from the slave server, create a corresponding socket connection for the request, and look at the slave server as a client (in master-slave replication, the master server and the slave server are actually clients of each other. end and server)
The socket connection is established, and the slave server actively sends a PING command to the main service. If the main server returns PONG within the specified timeout period, the socket is proved The word connection is available, otherwise disconnect and reconnect
If the master server sets a password (masterauth), then the slave server sends the AUTH masterauth command to the master server for authentication. Note that if the slave server sends a password but the master service does not set a password, the master server will send a no password is set error; if the master server requires a password but the slave server does not send a password, the master server will send a NOAUTH error; If the passwords do not match, the master server sends an invalid password error.
The slave server sends REPLCONF listening-port xxxx (xxxx represents the port of the slave server) to the master server. After receiving the command, the master server will save the data. When the client uses INFO replication to query the master-slave information, it can return the data.
The slave server sends the psync command. Please see the above for this step. Figure two situations of psync
The master server and the slave server are clients of each other, performing data requests/responses
Master server The heartbeat packet mechanism is used between the server and the slave server to determine whether the connection is disconnected. The slave server sends a command to the master server every 1 second, REPLCONF ACL offset (replication offset of the slave server). This mechanism can ensure the correct synchronization of data between the master and slave. If the offsets are not equal, the master server will take Incremental/full synchronization measures are used to ensure consistent data status between master and slave (the choice of incremental/full depends on whether the data of offset 1 is still in the replication backlog buffer)

2.3 Version 4.0

Redis versions 2.8-4.0 still have some room for improvement. Can incremental synchronization be performed when the main server is switched? Therefore, Redis 4.0 version has been optimized to deal with this problem, and psync has been upgraded to psync2.0. pync2.0 abandoned the server running ID and used replid and replid2 instead. Replid stores the running ID of the current main server, and replid2 saves the running ID of the previous main server.

Replication offset
Replication backlog
Main server running id (replid)
Last main server running id (replid2)

We can solve the main server through replid and replid2 When switching, the problem of incremental synchronization:

If replid is equal to the running id of the current main server, then determine the synchronization method incremental/full synchronization
If the replicas are not equal, determine whether replicas 2 are equal (whether they belong to the slave server of the previous master server). If they are equal, you can still choose incremental/full synchronization. If they are not equal, you can only perform full synchronization.

2. Sentinel

1. Introduction

Master-slave replication lays the foundation for Redis distribution The basis, but ordinary master-slave replication cannot achieve high availability. In the ordinary master-slave replication mode, if the master server goes down, the operation and maintenance personnel can only manually switch the master server. Obviously, this solution is not advisable. In response to the above situation, Redis officially launched a high-availability solution that can resist node failures - Redis Sentinel. Redis Sentinel: A Sentinel system composed of one or more Sentinel instances. It can monitor any number of master and slave servers. When the monitored master server goes down, the master server will be automatically offline, and the slave server will be upgraded to New master server.

The following example: When the old Master's offline time exceeds the upper limit of offline time set by the user, the Sentinel system will perform a failover operation on the old Master. The failover operation includes three steps:

Select the latest data in the Slave as the new Master
Send new replication instructions to other Slaves to make other slave servers become the new Master's Slave
Continue to monitor the old Master, and if it comes online, set the old Master as the Slave of the new Master

Take you through master-slave replication, Sentinel, and clustering in Redis

This article is based on the following resource list:

##192.168.211.105Redis Slave/ Sentinel#192.168.211.106

IP address

Node role

Port

##192.168.211.104

Redis Master/ Sentinel

6379/26379

##6379/26379

Redis Slave/ Sentinel

6379/26379

2. Sentinel initialization and network connection

There is nothing particularly magical about Sentinel. It is a simpler Redis server. When Sentinel starts, it will load different command tables and configuration files. So essentially Sentinel is a Redis service with fewer commands and some special functions. When a Sentinel starts, it needs to go through the following steps:

Initialize the Sentinel server
Replace the ordinary Redis code with Sentinel-specific code
Initialize Sentinel status
Initialize the main server list monitored by Sentinel based on the Sentinel configuration file given by the user
Create a network connection to the master server
Acquire the slave server information based on the master service, and create a network connection to the slave server
According to the release /Subscribe to obtain Sentinel information and create network connections between Sentinels

2.1 Initialize Sentinel server

Sentinel is essentially a Redis server, so starting Sentinel requires starting a Redis server, but Sentinel does not need to read the RDB/AOF file to restore the data state.

2.2 Replace ordinary Redis code with Sentinel-specific code

Sentinel is used for fewer Redis commands. Most commands are not supported by the Sentinel client, and Sentinel has some special functions. These require Sentinel to replace the code used by the Redis server with Sentinel-specific code at startup. During this period, Sentinel will load a different command table than the ordinary Redis server. Sentinel does not support commands such as SET and DBSIZE; it retains support for PING, PSUBSCRIBE, SUBSCRIBE, UNSUBSCRIBE, INFO and other commands; these commands provide guarantees for Sentinel's work.

2.3 Initializing Sentinel state

After loading Sentinel's unique code, Sentinel will initialize the sentinelState structure, which is used to store Sentinel-related status information, the most important of which is the masters dictionary.

struct sentinelState {
   
    //当前纪元，故障转移使用
 uint64_t current_epoch; 
  
    // Sentinel监视的主服务器信息 
    // key -> 主服务器名称 
    // value -> 指向sentinelRedisInstance指针
    dict *masters; 
    // ...
} sentinel;

Copy after login

2.4 Initialize the list of master servers monitored by Sentinel

The list of master servers monitored by Sentinel is stored in the masters dictionary of sentinelState. When sentinelState is created, the list of master servers monitored by Sentinel begins to be initialized. .

The key of masters is the name of the main service
The value of masters is a pointer to sentinelRedisInstance

The name of the main server is specified by our sentinel.conf configuration file. The following main server name is redis-master (I have a configuration of one master and two slaves here):

daemonize yes
port 26379
protected-mode no
dir "/usr/local/soft/redis-6.2.4/sentinel-tmp"
sentinel monitor redis-master 192.168.211.104 6379 2
sentinel down-after-milliseconds redis-master 30000
sentinel failover-timeout redis-master 180000
sentinel parallel-syncs redis-master 1

Copy after login

sentinelRedisInstance instance saves the information of the Redis server (Master server, slave server, and Sentinel information are all stored in this instance).

typedef struct sentinelRedisInstance {
 
    // 标识值，标识当前实例的类型和状态。如SRI_MASTER、SRI_SLVAE、SRI_SENTINEL
    int flags;
    
    // 实例名称 主服务器为用户配置实例名称、从服务器和Sentinel为ip:port
    char *name;
    
    // 服务器运行ID
    char *runid;
    
    //配置纪元，故障转移使用
 uint64_t config_epoch; 
    
    // 实例地址
    sentinelAddr *addr;
    
    // 实例判断为主观下线的时长 sentinel down-after-milliseconds redis-master 30000
    mstime_t down_after_period; 
    
    // 实例判断为客观下线所需支持的投票数 sentinel monitor redis-master 192.168.211.104 6379 2
    int quorum;
    
    // 执行故障转移操作时，可以同时对新的主服务器进行同步的从服务器数量 sentinel parallel-syncs redis-master 1
    int parallel-syncs;
    
    // 刷新故障迁移状态的最大时限 sentinel failover-timeout redis-master 180000
 mstime_t failover_timeout;
    
    // ...
} sentinelRedisInstance;

Copy after login

According to the above configuration of one master and two slaves, you will get the following instance structure:

Take you through master-slave replication, Sentinel, and clustering in Redis

2.5 Create a network connection to the main server

After the instance structure is initialized, Sentinel will begin to create a network connection to the Master. In this step, Sentinel will become the client of the Master. A command connection and a subscription connection will be created between Sentinel and Master:

Command connection is used to obtain master-slave information
Subscription connection is used Broadcast information between Sentinels. Each Sentinel and the master-slave server it monitors will subscribe to the _sentinel_:hello channel (note that no subscription connections are created between Sentinels. They obtain other Sentinels by subscribing to the _sentinel_:hello channel. Initial information)

Take you through master-slave replication, Sentinel, and clustering in Redis

After the command connection is created, Sentinel sends an INFO command to the Master every 10 seconds, and uses the Master’s reply information Two aspects of knowledge can be obtained:

Master’s own information
Slave information under Master

Take you through master-slave replication, Sentinel, and clustering in Redis

2.6 Create a network connection to the slave server

Obtain the slave server information according to the main service. Sentinel can create a network connection to the Slave, and also create a network connection between Sentinel and Slave. Command connection and subscription connection.

SlaveTake you through master-slave replication, Sentinel, and clustering in Redis

当Sentinel和Slave之间创建网络连接之后，Sentinel成为了Slave的客户端，Sentinel也会每隔10秒钟通过INFO指令请求Slave获取服务器信息。到这一步Sentinel获取到了Master和Slave的相关服务器数据。这其中比较重要的信息如下：

服务器ip和port
服务器运行id run id
服务器角色role
服务器连接状态mater_link_status
Slave复制偏移量slave_repl_offset（故障转移中选举新的Master需要使用）
Slave优先级slave_priority

此时实例结构信息如下所示：

Take you through master-slave replication, Sentinel, and clustering in Redis

2.7 创建Sentinel之间的网络连接

此时是不是还有疑问，Sentinel之间是怎么互相发现对方并且相互通信的，这个就和上面Sentinel与自己监视的主从之间订阅_sentinel_:hello频道有关了。 Sentinel会与自己监视的所有Master和Slave之间订阅_sentinel_:hello频道，并且Sentinel每隔2秒钟向_sentinel_:hello频道发送一条消息，消息内容如下：

PUBLISH sentinel:hello ",,,,,,,"

其中s代码Sentinel，m代表Master；ip表示IP地址，port表示端口、runid表示运行id、epoch表示配置纪元。

多个Sentinel在配置文件中会配置相同的主服务器ip和端口信息，因此多个Sentinel均会订阅_sentinel_:hello频道，通过频道接收到的信息就可获取到其他Sentinel的ip和port，其中有如下两点需要注意：

如果获取到的runid与Sentinel自己的runid相同，说明消息是自己发布的，直接丢弃
如果不相同，则说明接收到的消息是其他Sentinel发布的，此时需要根据ip和port去更新或新增Sentinel实例数据

Sentinel之间不会创建订阅连接，它们只会创建命令连接：

Take you through master-slave replication, Sentinel, and clustering in Redis

此时实例结构信息如下所示：

Take you through master-slave replication, Sentinel, and clustering in Redis

3、Sentinel工作

Sentinel最主要的工作就是监视Redis服务器，当Master实例超出预设的时限后切换新的Master实例。这其中有很多细节工作，大致分为检测Master是否主观下线、检测Master是否客观下线、选举领头Sentinel、故障转移四个步骤。

3.1 检测Master是否主观下线

Sentinel每隔1秒钟，向sentinelRedisInstance实例中的所有Master、Slave、Sentinel发送PING命令，通过其他服务器的回复来判断其是否仍然在线。

sentinel down-after-milliseconds redis-master 30000

Copy after login

在Sentinel的配置文件中，当Sentinel PING的实例在连续down-after-milliseconds配置的时间内返回无效命令，则当前Sentinel认为其主观下线。Sentinel的配置文件中配置的down-after-milliseconds将会对其sentinelRedisInstance实例中的所有Master、Slave、Sentinel都适应。

无效指令指的是+PONG、-LOADING、-MASTERDOWN之外的其他指令，包括无响应

如果当前Sentinel检测到Master处于主观下线状态，那么它将会修改其sentinelRedisInstance的flags为SRI_S_DOWN

Take you through master-slave replication, Sentinel, and clustering in Redis

3.2 检测Master是否客观下线

当前Sentinel认为其下线只能处于主观下线状态，要想判断当前Master是否客观下线，还需要询问其他Sentinel，并且所有认为Master主观下线或者客观下线的总和需要达到quorum配置的值，当前Sentinel才会将Master标志为客观下线。

Take you through master-slave replication, Sentinel, and clustering in Redis

当前Sentinel向sentinelRedisInstance实例中的其他Sentinel发送如下命令：

SENTINEL is-master-down-by-addr <ip> <port> <current_epoch> <runid>

Copy after login

ip：被判断为主观下线的Master的IP地址
port：被判断为主观下线的Master的端口
current_epoch：当前sentinel的配置纪元
runid：当前sentinel的运行id，runid

current_epoch和runid均用于Sentinel的选举，Master下线之后，需要选举一个领头Sentinel来选举一个新的Master，current_epoch和runid在其中发挥着重要作用，这个后续讲解。

接收到命令的Sentinel，会根据命令中的参数检查主服务器是否下线，检查完成后会返回如下三个参数：

down_state：检查结果1代表已下线、0代表未下线
leader_runid：返回*代表判断是否下线，返回runid代表选举领头Sentinel
leader_epoch：当leader_runid返回runid时，配置纪元会有值，否则一直返回0

当Sentinel检测到Master处于主观下线时，询问其他Sentinel时会发送current_epoch和runid，此时current_epoch=0，runid=*
接收到命令的Sentinel返回其判断Master是否下线时down_state = 1/0，leader_runid = *，leader_epoch=0

Take you through master-slave replication, Sentinel, and clustering in Redis

3.3 选举领头Sentinel

down_state返回1，证明接收is-master-down-by-addr命令的Sentinel认为该Master也主观下线了，如果down_state返回1的数量（包括本身）大于等于quorum（配置文件中配置的值），那么Master正式被当前Sentinel标记为客观下线。 此时，Sentinel会再次发送如下指令：

SENTINEL is-master-down-by-addr <ip> <port> <current_epoch> <runid>

Copy after login

此时的runid将不再是0，而是Sentinel自己的运行id（runid）的值，表示当前Sentinel希望接收到is-master-down-by-addr命令的其他Sentinel将其设置为领头Sentinel。这个设置是先到先得的，Sentinel先接收到谁的设置请求，就将谁设置为领头Sentinel。发送命令的Sentinel会根据其他Sentinel回复的结果来判断自己是否被该Sentinel设置为领头Sentinel，如果Sentinel被其他Sentinel设置为领头Sentinel的数量超过半数Sentinel（这个数量在sentinelRedisInstance的sentinel字典中可以获取），那么Sentinel会认为自己已经成为领头Sentinel，并开始后续故障转移工作（由于需要半数，且每个Sentinel只会设置一个领头Sentinel，那么只会出现一个领头Sentinel，如果没有一个达到领头Sentinel的要求，Sentinel将会重新选举直到领头Sentinel产生为止）。

3.4 故障转移

故障转移将会交给领头sentinel全权负责，领头sentinel需要做如下事情：

从原先master的slave中，选择最佳的slave作为新的master
让其他slave成为新的master的slave
继续监听旧master，如果其上线，则将其设置为新的master的slave

这其中最难的一步是如果选择最佳的新Master，领头Sentinel会做如下清洗和排序工作：

判断slave是否有下线的，如果有从slave列表中移除
删除5秒内未响应sentinel的INFO命令的slave
删除与下线主服务器断线时间超过down_after_milliseconds * 10 的所有从服务器
根据slave优先级slave_priority，选择优先级最高的slave作为新master
如果优先级相同，根据slave复制偏移量slave_repl_offset，选择偏移量最大的slave作为新master
如果偏移量相同，根据slave服务器运行id run id排序，选择run id最小的slave作为新master

新的Master产生后，领头sentinel会向已下线主服务器的其他从服务器（不包括新Master）发送SLAVEOF ip port命令，使其成为新master的slave。

到这里Sentinel的的工作流程就算是结束了，如果新master下线，则循环流程即可！

三、集群

1、简介

Redis集群是Redis提供的分布式数据库方案，集群通过分片（sharding）进行数据共享，Redis集群主要实现了以下目标：

在1000个节点的时候仍能表现得很好并且可扩展性是线性的。
没有合并操作（多个节点不存在相同的键），这样在 Redis 的数据模型中最典型的大数据值中也能有很好的表现。
Write safety, the system attempts to save all write operations performed by clients connected to the majority of nodes. However, Redis cannot guarantee that data will not be lost at all. Asynchronous and synchronous master-slave replication will cause data loss anyway.
Availability, if the master node is unavailable, the slave node can replace the master node.

Regarding the learning of Redis cluster, if you don’t have any experience, it is recommended to read these three articles (Chinese series) first: Redis Cluster Tutorial

REDIS cluster-tutorial -- Redis Chinese Information Station -- Redis China User Group (CRUG)

Redis cluster specification

REDIS cluster-spec -- Redis Chinese Information Station -- Redis China User Group (CRUG)

Redis3 master 3 slave pseudo cluster deployment

CentOS 7 stand-alone installation Redis Cluster (3 master 3 From the pseudo cluster), it only takes five simple steps_Li Ziba’s blog-CSDN blog

The following content relies on the three master and three slave structure in the figure below:

Take you through master-slave replication, Sentinel, and clustering in Redis

Resource list:

##Master[0]Master[1]Master[2]# Slave[0]##Slave[1]192.168.211.107: 6349192.168.211.107:6359

##Node	IP	slot range
192.168 .211.107:6319	Slots 0 - 5460
192.168.211.107:6329	Slots 5461 - 10922
192.168.211.107:6339	Slots 10923 - 16383
192.168.211.107:6369
		##Slave[2]
		Take you through master-slave replication, Sentinel, and clustering in Redis 2、集群内部 Redis 集群没有使用一致性hash, 而是引入了哈希槽的概念。Redis 集群有16384个哈希槽，每个key通过CRC16校验后对16384取模来决定放置哪个槽，这种结构很容易添加或者删除节点。集群的每个节点负责一部分hash槽，比如上面资源清单的集群有3个节点，其槽分配如下所示：节点 Master[0] 包含 0 到 5460 号哈希槽节点 Master[1] 包含5461 到 10922 号哈希槽节点 Master[2] 包含10923到 16383 号哈希槽深入学习Redis集群之前，需要了解集群中Redis实例的内部结构。当某个Redis服务节点通过cluster_enabled配置为yes开启集群模式之后，Redis服务节点不仅会继续使用单机模式下的服务器组件，还会增加custerState、clusterNode、custerLink等结构用于存储集群模式下的特殊数据。如下三个数据承载对象一定要认真看，尤其是结构中的注释，看完之后集群大体上怎么工作的，心里就有数了，嘿嘿嘿； 2.1 clsuterNode clsuterNode用于存储节点信息，比如节点的名字、IP地址、端口信息和配置纪元等等，以下代码列出部分非常重要的属性： typedef struct clsuterNode { // 创建时间 mstime_t ctime; // 节点名字，由40位随机16进制的字符组成（与sentinel中讲的服务器运行id相同） char name[REDIS_CLUSTER_NAMELEN]; // 节点标识，可以标识节点的角色和状态 // 角色 -> 主节点或从节点例如：REDIS_NODE_MASTER(主节点) REDIS_NODE_SLAVE(从节点) // 状态 -> 在线或下线例如：REDIS_NODE_PFAIL(疑似下线) REDIS_NODE_FAIL(下线) int flags; // 节点配置纪元，用于故障转移，与sentinel中用法类似 // clusterState中的代表集群的配置纪元 unit64_t configEpoch; // 节点IP地址 char ip[REDIS_IP_STR_LEN]; // 节点端口 int port; // 连接节点的信息 clusterLink link; // 一个2048字节的二进制位数组 // 位数组索引值可能为0或1 // 数组索引i位置值为0，代表节点不负责处理槽i // 数组索引i位置值为1，代表节点负责处理槽i unsigned char slots[16384/8]; // 记录当前节点处理槽的数量总和 int numslots; // 如果当前节点是从节点 // 指向当前从节点的主节点 struct clusterNode slaveof; // 如果当前节点是主节点 // 正在复制当前主节点的从节点数量 int numslaves; // 数组——记录正在复制当前主节点的所有从节点 struct clusterNode *slaves; } clsuterNode; Copy after login 上述代码中可能不太好理解的是slots[16384/8]，其实可以简单的理解为一个16384大小的数组，数组索引下标处如果为1表示当前槽属于当前clusterNode处理，如果为0表示不属于当前clusterNode处理。clusterNode能够通过slots来识别，当前节点处理负责处理哪些槽。初始clsuterNode或者未分配槽的集群中的clsuterNode的slots如下所示：假设集群如上面我给出的资源清单，此时代表Master[0]的clusterNode的slots如下所示： 2.2 clusterLink clusterLink是clsuterNode中的一个属性，用于存储连接节点所需的相关信息，比如套接字描述符、输入输出缓冲区等待，以下代码列出部分非常重要的属性： typedef struct clusterState { // 连接创建时间 mstime_t ctime; // TCP 套接字描述符 int fd; // 输出缓冲区，需要发送给其他节点的消息缓存在这里 sds sndbuf; // 输入缓冲区，接收打其他节点的消息缓存在这里 sds rcvbuf; // 与当前clsuterNode节点代表的节点建立连接的其他节点保存在这里 struct clusterNode node; } clusterState; Copy after login 2.3 custerState 每个节点都会有一个custerState结构，这个结构中存储了当前集群的全部数据，比如集群状态、集群中的所有节点信息（主节点、从节点）等等，以下代码列出部分非常重要的属性： typedef struct clusterState { // 当前节点指针，指向一个clusterNode clusterNode myself; // 集群当前配置纪元，用于故障转移，与sentinel中用法类似 unit64_t currentEpoch; // 集群状态在线/下线 int state; // 集群中处理着槽的节点数量总和 int size; // 集群节点字典，所有clusterNode包括自己 dict node; // 集群中所有槽的指派信息 clsuterNode slots[16384]; // 用于槽的重新分配——记录当前节点正在从其他节点导入的槽 clusterNode importing_slots_from[16384]; // 用于槽的重新分配——记录当前节点正在迁移至其他节点的槽 clusterNode migrating_slots_to[16384]; // ... } clusterState; Copy after login 在custerState有三个结构需要认真了解的，第一个是slots数组，clusterState中的slots数组与clsuterNode中的slots数组是不一样的，在clusterNode中slots数组记录的是当前clusterNode所负责的槽，而clusterState中的slots数组记录的是整个集群的每个槽由哪个clsuterNode负责，因此集群正常工作的时候clusterState的slots数组每个索引指向负责该槽的clusterNode，集群槽未分配之前指向null。如图展示资源清单中的集群clusterState中的slots数组与clsuterNode中的slots数组： Redis集群中使用两个slots数组的原因是出于性能的考虑：当我们需要获取整个集群中clusterNode分别负责什么槽时，只需要查询clusterState中的slots数组即可。如果没有clusterState的slots数组，则需要遍历所有的clusterNode结构，这样显然要慢一些此外clusterNode中的slots数组也有存在的必要，因为集群中任意一个节点之间需要知道彼此负责的槽，此时节点之间只需要互相传输clusterNode中的slots数组结构就行。第二个需要认真了解的结构是node字典，该结构虽然简单，但是node字典中存储了所有的clusterNode，这也是Redis集群中的单个节点获取其他主节点、从节点信息的主要位置，因此我们也需要注意一下。第三个需要认真了解的结构是importing_slots_from[16384]数组和migrating_slots_to[16384]，这两个数组在集群重新分片时需要使用，需要重点了解，后面再说吧，这里说的话顺序不太对。 3. Cluster work 3.1 How to assign slots? The Redis cluster has a total of 16384 slots. As shown in the resource list above, we are in a three-master and three-slave cluster. Each master node is responsible for its own corresponding slot. However, in the above three-master and three-slave deployment process, there is no You see that I assigned the slot to the corresponding master node. This is because the Redis cluster itself has divided the slots for us internally. But what if we want to assign the slots ourselves? We can send the following command to the node to assign one or more slots to the current node: CLUSTER ADDSLOTS* For example, we want to Slots 0 and 1 are assigned to Master[0]. We only need to send the following command to the Master[0] node: CLUSTER ADDSLOTS 0 1 When a node is assigned a slot, the slots array of clusterNode will be updated. The node will send the slots it is responsible for processing, which is the slots array, to other nodes in the cluster through messages. Other nodes will receive the message after receiving it. Update the slots array corresponding to clusterNode and the solts array of clusterState. 3.2 How is ADDSLOTS implemented inside the Redis cluster? This is actually relatively simple. When we send the CLUSTER ADDSLOTS command to a node in the Redis cluster, the current node will first use the slots array in clusterState to confirm whether the slot assigned to the current node has not been assigned to If other nodes have been assigned, an exception will be thrown directly and an error will be returned to the assigned client. If all slots assigned to the current node are not assigned to other nodes, the current node assigns those slots to itself. There are three main steps for assignment: Update the slots array of clusterState and point the specified slots[i] to the current clusterNode Update the slots of clusterNode Array, update the value at the specified slots[i] to 1 Send a message to other nodes in the cluster, send the slots array of clusterNode to other nodes, and other nodes receive it After the message, the corresponding slots array of clusterState and the slots array of clusterNode are also updated. 3.3 With so many nodes in the cluster, how does the client know which node to request? Before understanding this problem, we must first know one point. How does the Redis cluster calculate which slot the current key belongs to? According to the introduction on the official website, Redis does not actually use a consistent hash algorithm. Instead, each requested key is checked by CRC16 and then modulo 16384 is used to determine which slot to place it in. HASH_SLOT = CRC16(key) mod 16384 At this time, when the client connects to send a request to a node, The node currently receiving the command will first calculate the slot i to which the current key belongs through an algorithm. After the calculation, the current node will determine whether the slot i of the clusterState is its own responsibility. If it happens to be its own responsibility, the current node will respond to the client's request. If the request is not handled by the current node, it will go through the following steps: The node returns a MOVED redirection error to the client, and the calculated key will be correctly processed in the MOVED redirection error. The ip and port of the clusterNode are returned to the client When the client receives the MOVED redirection error returned by the node, it will forward the command to the correct node based on the ip and port, and the entire process The process is transparent to programmers and is jointly completed by the server and client of the Redis cluster. 3.4 What if I want to reallocate the slot that has been assigned to node A to node B? This question actually covers many problems, such as removing certain nodes in the Redis cluster, adding nodes, etc. It can be summarized as moving the hash slot from one node to another node. And the very cool thing about Redis cluster is that it supports online (non-stop) allocation, which is officially said to be cluster online reconfiguration (live reconfiguration). Before implementing it, let’s take a look at the CLUSTER instructions. Once you know the instructions, you will be able to operate: ##CLUSTER ADDSLOTS slot1 [slot2] … [slotN] CLUSTER DELSLOTS slot1 [slot2] … [slotN] CLUSTER SETSLOT slot NODE node CLUSTER SETSLOT slot MIGRATING node CLUSTER SETSLOT slot IMPORTING node CLUSTER 用于槽分配的指令主要有如上这些，ADDSLOTS 和DELSLOTS主要用于槽的快速指派和快速删除，通常我们在集群刚刚建立的时候进行快速分配的时候才使用。CLUSTER SETSLOT slot NODE node也用于直接给指定的节点指派槽。如果集群已经建立我们通常使用最后两个来重分配，其代表的含义如下所示：当一个槽被设置为 MIGRATING，原来持有该哈希槽的节点仍会接受所有跟这个哈希槽有关的请求，但只有当查询的键还存在原节点时，原节点会处理该请求，否则这个查询会通过一个 -ASK 重定向（-ASK redirection）转发到迁移的目标节点。当一个槽被设置为 IMPORTING，只有在接受到 ASKING 命令之后节点才会接受所有查询这个哈希槽的请求。如果客户端一直没有发送 ASKING 命令，那么查询都会通过 -MOVED 重定向错误转发到真正处理这个哈希槽的节点那里。上面这两句话是不是感觉不太看的懂，这是官方的描述，不太懂的话我来给你通俗的描述，整个流程大致如下步骤： redis-trib(集群管理软件redis-trib会负责Redis集群的槽分配工作)，向目标节点（槽导入节点）发送CLUSTER SETSLOT slot IMPORTING node命令，目标节点会做好从源节点（槽导出节点）导入槽的准备工作。 redis-trib随即向源节点发送CLUSTER SETSLOT slot MIGRATING node命令，源节点会做好槽导出准备工作 redis-trib随即向源节点发送CLUSTER GETKEYSINSLOT slot count命令，源节点接收命令后会返回属于槽slot的键，最多返回count个键 redis-trib会根据源节点返回的键向源节点依次发送MIGRATE ip port key 0 timeout命令，如果key在源节点中，将会迁移至目标节点。迁移完成之后，redis-trib会向集群中的某个节点发送CLUSTER SETSLOT slot NODE node命令，节点接收到命令后会更新clusterNode和clusterState结构，然后节点通过消息传播槽的指派信息，至此集群槽迁移工作完成，且集群中的其他节点也更新了新的槽分配信息。 3.5 如果客户端访问的key所属的槽正在迁移怎么办？优秀的你总会想到这种并发情况，牛皮呀！大佬们！这个问题官方也考虑了，还记得我们在聊clusterState结构的时候么？importing_slots_from和migrating_slots_to就是用来处理这个问题的。 typedef struct clusterState { // ... // 用于槽的重新分配——记录当前节点正在从其他节点导入的槽 clusterNode importing_slots_from[16384]; // 用于槽的重新分配——记录当前节点正在迁移至其他节点的槽 clusterNode migrating_slots_to[16384]; // ... } clusterState; Copy after login 当节点正在导出某个槽，则会在clusterState中的migrating_slots_to数组对应的下标处设置其指向对应的clusterNode，这个clusterNode会指向导入的节点。当节点正在导入某个槽，则会在clusterState中的importing_slots_from数组对应的下标处设置其指向对应的clusterNode，这个clusterNode会指向导出的节点。有了上述两个相互数组，就能判断当前槽是否在迁移了，而且从哪里迁移来，要迁移到哪里去？搞笑不就是这么简单…… 此时，回到问题中，如果客户端请求的key刚好属于正在迁移的槽。那么接收到命令的节点首先会尝试在自己的数据库中查找键key，如果这个槽还没迁移完成，且当前key刚好也还没迁移完成，那就直接响应客户端的请求就行。如果该key已经不在了，此时节点会去查询migrating_slots_to数组对应的索引槽，如果索引处的值不为null，而是指向了某个clusterNode结构，那说明这个key已经被迁移到这个clusterNode了。这个时候节点不会继续在处理指令，而是返回ASKING命令，这个命令也会携带导入槽clusterNode对应的ip和port。客户端在接收到ASKING命令之后就需要将请求转向正确的节点了，不过这里有一点需要注意的地方（因此我放个表情包在这里，方便读者注意）。 As mentioned before, when the node finds that the current slot does not belong to its own processing, it will return the MOVED instruction. So how to handle the slot in migration? This is what this Redis cluster is for. When the node discovers that the slot is migrating, it returns an ASKING command to the client. The client will receive the ASKING command, which contains the node IP and port of the clusterNode to which the slot is being migrated. Then the client will first send an ASKING command to the migrating clusterNode. The purpose of this command must be to tell the current node that you need to make an exception to handle this request, because this slot has been migrated to you, and you cannot reject me directly ( Therefore, if Redis does not receive the ASKING command, it will directly query the clusterState of the node, and the slot being migrated has not been updated to the clusterState, so it can only return MOVED directly, which will keep looping many times...), received The node with the ASKING command will forcefully execute this request once (only once, and you will need to send the ASKING command again in advance next time). 4. Cluster failure Redis cluster failure is relatively simple. This is related to the master node in sentinel being down or not responding within the specified maximum time. Re-elect a new master node from the slave node. The method is actually similar. Of course, the premise is that for each master node in the Redis cluster, we have set up the slave node in advance, otherwise it will be useless... The general steps are as follows: In a normally working cluster, each node will regularly send PING commands to other nodes. If the node receiving the command does not return a PONG message within the specified time, The current node will set the flags of the clusterNode of the node receiving the command to REDIS_NODE_PFAIL. PFAIL is not offline, but suspected of being offline. The cluster node will inform other nodes by sending messages of the status information of each node in the cluster If more than half of the cluster is responsible The master nodes of the processing slot all set a certain master node as suspected to be offline, then this node will be marked as offline, and the node will set the flags of the clusterNode of the node receiving the command to REDIS_NODE_FAIL, FAIL means that it is offline Cluster nodes inform other nodes of the status information of each node in the cluster by sending messages. At this time, the slave node of the offline node discovers that its master node has been marked as offline. status, then it’s time to step forward The slave node of the offline master node will elect a slave node as the latest master node, and execute the selected node to point to SLAVEOF no one becomes the new master node The new master node will revoke the slot assignments of the original master node and modify these slot assignments to itself, that is, modify the clusterNode structure and clusterState structure The new master node broadcasts a PONG instruction to the cluster. Other nodes will know that a new master node has been generated and update the clusterNode structure and clusterState structure If the new master node will send a new SLAVEOF instruction to the remaining slave nodes of the original master node, making it its own slave node The last new master node will be responsible for the original master node The response work of the slot I wrote it very vaguely here. If you need to dig in detail, you must read this article: REDIS cluster-spec - - Redis Chinese Information Station - Redis China User Group (CRUG) http://redis.cn/topics/cluster-spec.html Or you can check out Teacher Huang Jianhong The book "Redis Design and Implementation" is very well written, and I also referred to a lot of the content. For more programming related knowledge, please visit: Programming Video! !

The above is the detailed content of Take you through master-slave replication, Sentinel, and clustering in Redis. For more information, please follow other related articles on the PHP Chinese website!

Related labels：

redis sentinel master-slave replication distributed cluster

Previous article：In-depth analysis of the Info command in Redis Next article：Sharing of Redis high-frequency interview questions in 2023 (with answer analysis)

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn