This article brings you relevant knowledge about Redis, which mainly introduces the relevant content from high-availability architecture construction to principle analysis. Let’s take a look at it together. I hope it will be helpful to everyone. .

Due to the company’s recent system optimization, the large table was divided into tables some time ago. , now it’s time to do redis again. Regarding redis, one of the requirements is to migrate the redis service from Alibaba Cloud to the company's own server (due to the nature of the company). I just took this opportunity to review the high-availability cluster architecture of redis. There are three redis cluster modes, namely master-slave replication mode, sentinel mode and Cluster cluster mode. Generally, Sentinel and Cluster clusters are used more frequently. Let’s briefly understand these three modes.

Persistence mechanism

Before understanding the cluster architecture, we must first introduce the persistence mechanism of redis, because persistence will be involved in the subsequent cluster. Redis persistence is to store the data cached in the memory according to some rules to prevent data recovery or master-slave node data synchronization in the cluster architecture when the redis service goes down. There are two ways of redis persistence: RDB and AOF. After version 4.0, a new hybrid persistence mode was introduced.


RDB is the persistence mechanism enabled by redis by default. Its persistence method is based on the rules configured by the user"At least Y changes have occurred within X seconds" , generate a snapshot and save it to the dump.rdb binary file. By default, redis is configured with three configurations: at least one cache key change has occurred within 900 seconds, at least 10 cache key changes have occurred within 300 seconds, and at least 10,000 changes have occurred within 60 seconds.

Redis high availability architecture construction to principle analysis

In addition to redis automatic snapshot persistence data, there are two commands that can help us manually snapshot memory data. These two commands are saveandbgsave.

Redis high availability architecture construction to principle analysis

  • save: Perform data snapshots in a synchronous manner. When the amount of cached data is large, it will block the execution of other commands. low efficiency.

  • bgsave: Perform data snapshots in an asynchronous manner. The redis main thread forks out a child process to perform data snapshots, which will not block the execution of other commands. Higher efficiency. Since an asynchronous snapshot is used, it is possible that other commands may modify the data during the snapshot process. In order to avoid this problem, reids adopts the copy-on-write (Cpoy-On-Write) method. Because the process taking the snapshot at this time is forked by the main thread, it enjoys the resources of the main thread. When data changes occur during the snapshot process, , then the data will be copied and the copy data will be generated, and the child process will write the modified copy data to the dump.rdb file.

RDB snapshots are stored in binary, so data recovery will be faster, but there is a risk of data loss. If the snapshot rule is set so that at least 100 data changes occur within 60 seconds, then at 50 seconds, the redis service suddenly goes down for some reason, and all data within these 50 seconds will be lost.


AOF is another persistence method of Redis. Unlike RDB, AOF records every command that changes data and saves it to the appendonly.aof file on the disk. , when the redis service is restarted, the file will be loaded and the commands saved in the file will be executed again to achieve the effect of data recovery. By default, AOF is turned off and can be turned on by modifying the conf configuration file.

 # appendonly no  关闭AOF持久化
 appendonly yes   # 开启AOF持久化
 # The name of the append only file (default: "appendonly.aof")
 appendfilename "appendonly.aof" # 持久化文件名
Copy after login

AOF provides three ways to save commands to disk. By default, AOF uses appendfsync everysec for command persistence.

appendfsync always #每次有新的改写命令时,都会追加到磁盘的aof文件中。数据安全性最高,但效率最慢。
appendfsync everysec # 每一秒,都会将改写命令追加到磁盘中的aof文件中。如果发生宕机,也只会丢失1秒的数据。
appendfsync no #不会主动进行命令落盘,而是由操作系统决定什么时候写入到磁盘。数据安全性不高。
Copy after login

After opening AOF, you need to restart the redis service. When the relevant rewrite command is executed again, the operation command will be recorded in the aof file.

Redis high availability architecture construction to principle analysis

Redis high availability architecture construction to principle analysis

Compared with RDB, although AOF data security is higher, as the service continues to run, the files of aof will also It gets bigger and bigger, and the next time you restore data, the speed will get slower and slower. If both RDB and AOF are enabled, redis will give priority to AOF when restoring data. After all, AOF loses less data.

##RDBAOF## Recovery efficiencyData securitySpace usage



  # AOF重写配置,当aof文件达到60MB并且比上次重写后的体量多100%时自动触发AOF重写  auto-aof-rewrite-percentage 100
  auto-aof-rewrite-min-size 64mb
  aof-use-rdb-preamble yes # 开启混合持久化# aof-use-rdb-preamble no # 关闭混合持久化
Copy after login

AOF重写是指当aof文件越来越大时,redis会自动优化aof文件中无用的命令,从而减少文件体积。比如在处理文章阅读量时,每查看一次文章就会执行一次Incr命令,但是随着阅读量的不断增加,aof文件中的incr命令也会积累的越来越多。在AOF重写后,将会删除这些没用的Incr命令,将这些命令直接替换为set key value命令。除了redis自动重写AOF,如果需要,也可以通过bgrewriteaof命令手动触发。



Redis high availability architecture construction to principle analysis



High Low
Low High
Low High
IP 主/从节点 端口 版本 6379 5.0.14 6379 5.0.14 6379 5.0.14
  1. 配置从节点36.130,36.131机器中reids.conf


Redis high availability architecture construction to principle analysis

Redis high availability architecture construction to principle analysis

  1. 启动主节点36.128机器中reids服务
 ./src/redis-server redis.conf
Copy after login
Copy after login

Redis high availability architecture construction to principle analysis3.  依次启动从节点36.130,36.131机器中的redis服务

 ./src/redis-server redis.conf
Copy after login
Copy after login

启动成功后可以看到日志中显示已经与Master节点建立的连接。Redis high availability architecture construction to principle analysis如果出现与Master节点的连接被拒,那么先检查Master节点的服务器是否开启防火墙,如果开启,可以开放6379端口或者关闭防火墙。如果防火墙被关闭但连接仍然被拒,那么可以修改Master节点服务中的redis.conf文件。将bing修改为本机对外的网卡ip或者直接注释掉即可,然后重启服务器即可。Redis high availability architecture construction to principle analysis

Redis high availability architecture construction to principle analysis

  1. 查看状态


 info replication # 主节点查看连接信息
Copy after login

Redis high availability architecture construction to principle analysis


  • 全量数据同步Redis high availability architecture construction to principle analysis主从节点之间的数据同步是通过建立socket长连接来进行传输的。当Slave节点启动时,会与Master节点建立长连接,并且发送psync同步数据命令。当Master节点收到psync命令时,会执行pgsave进行rdb内存数据快照(这里的rdb快照与conf文件中是否开启rdb无关),如果在快照过程中有新的改写命令,那么Master节点会将这些命令保存到repl buffer缓冲区中。当快照结束后,会将rdb传输给Slave节点。Slave节点在接收到rdb后,如果存在旧数据,那么会将这些旧数据清除并加载rdb。加载完成后会接受master缓存在repl buffer中的新命令。在这些步骤全部执行完成后,主从节点已经算连接成功了,后续Master节点的命令会不断的发送到Slave节点。如果在高并发的情况下,可能会存在数据延迟的情况。

  • 部分数据同步

Redis high availability architecture construction to principle analysis


Redis high availability architecture construction to principle analysis


  • 优点

    1. 可以实现一主多从,读写分离,减轻Master节点读操作压力
    2. 是哨兵,集群架构的基础
  • 缺点

    1. Does not have automatic master-slave switching function. When the Master node goes down, you need to manually switch the master node.
    2. It is easy to have data inconsistency. When the Master node goes down, if there is data that is not synchronized, It will cause data loss

Sentinel mode

Sentinel mode further optimizes the master-slave replication and separates a separate sentinel process for monitoring the master-slave Regarding the server status in the architecture, once a downtime occurs, Sentinel will elect a new Master node within a short period of time and perform master-slave switching. Not only that, under a multi-sentinel node, each sentinel will monitor each other and monitor whether the sentinel node is down.

Redis high availability architecture construction to principle analysis

Environment Configuration

## 263795.0.14192.168.36.130From6379263795.0.14##

主从复制是哨兵模式的基础,所以在搭建哨兵前需要完成主从复制的配置。在搭建完主从后,哨兵的搭建就容易很多。 找到安装目录下的sentinel.conf文件并进行修改。主要修改两个地方,分别为哨兵端口port和监控的主节点ip地址和端口号。

Redis high availability architecture construction to principle analysis

Redis high availability architecture construction to principle analysis


Redis high availability architecture construction to principle analysis

Redis high availability architecture construction to principle analysis搭建成功后,就来通过代码演示主节点宕机的情况下,哨兵是否会帮助系统自动进行主备切换。在springboot项目中引入对应的pom,并配置对应的redis哨兵信息。

Copy after login
  port: 8081spring:
      master: mymaster # 主服务节点
      nodes:,, #哨兵节点
    timeout: 3000 #连接超时时间
Copy after login
public class RedisTest {

    private StringRedisTemplate stringRedisTemplate;
    * 每秒钟向redis中写入数据,中途kill掉主节点进程,模拟宕机
   public void test(@RequestParam(name = "key") String key,
                    @RequestParam(name = "value") String value) throws InterruptedException {
        int idx=0;
            try {
                stringRedisTemplate.opsForValue().set(key+idx, value);
            }catch (Exception e){
Copy after login

当启动服务后,通过节后向后端传递数据,可以看到输出的日志,表示redis哨兵集群已经可以正常运行了。那么这个时候kill掉36.128机器上的主节点,模拟服务宕机。通过日志可以知道,服务出现异常了,在过十几秒发现哨兵已经自动帮系统进行了主从切换,并且服务也可以正常访问了。Redis high availability architecture construction to principle analysis

2022-11-14 22:20:23.134  INFO 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : =====存储成功:test14,123=====
2022-11-14 22:20:24.142  INFO 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : =====存储成功:test15,123=====
2022-11-14 22:20:24.844  INFO 8764 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was /
2022-11-14 22:20:26.909  WARN 8764 --- [ioEventLoop-4-4] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to []: Connection refused: no further information: /
2022-11-14 22:20:28.165 ERROR 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : ====连接redis服务器失败:Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 3 second(s)====
2022-11-14 22:20:31.199  INFO 8764 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was
Copy after login
2022-11-14 22:20:52.189 ERROR 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : ====连接redis服务器失败:Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 3 second(s)====
2022-11-14 22:20:53.819  WARN 8764 --- [ioEventLoop-4-2] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to []: Connection refused: no further information: /
2022-11-14 22:20:56.194 ERROR 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : ====连接redis服务器失败:Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 3 second(s)====
2022-11-14 22:20:57.999  INFO 8764 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was
2022-11-14 22:20:58.032  INFO 8764 --- [ioEventLoop-4-4] i.l.core.protocol.ReconnectionHandler    : Reconnected to
2022-11-14 22:20:58.040  INFO 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : =====存储成功:test24,123=====
2022-11-14 22:20:59.051  INFO 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : =====存储成功:test25,123=====
2022-11-14 22:21:00.057  INFO 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : =====存储成功:test26,123=====
2022-11-14 22:21:01.065  INFO 8764 --- [nio-8081-exec-2] com.gz.redis.RedisTest                   : =====存储成功:test27,123=====
Copy after login



sentinel monitor master 6378 2
Copy after login


  • 当某个的哨兵确定主节点已经下线时,会像其他哨兵发送is-master-down-by-addr命令,要求将自己设为leader,并处理故障转移工作。

  • 其他哨兵在收到命令后,进行投票选举

  • 如果票数过半时,那么发送命令的哨兵节点将成为主节点,并进行故障转移。




 min-replicas-to-write 3 # 最少从节点为3
 min-replicas-max-lag 10 # 表示数据复制和同步的延迟不能超过10秒
Copy after login




Cluster Mode

In sentry mode, although master-slave switching can be performed when the master node is down, the switching process takes more than ten seconds or more, which may cause some problems. Loss of data. If the amount of concurrency is not high, you can use this cluster mode, but in the case of high concurrency, these ten seconds may cause serious consequences. Therefore, many Internet companies use the Cluster cluster architecture. The Cluster cluster is composed of multiple redis nodes. Each redis service node has a Master node and multiple Slave nodes. When storing data, redis will perform a hash operation on the key of the data and assign it to different slots based on the operation results. Bit. Under normal circumstances, the Cluster cluster architecture requires 6 nodes (three masters and three slaves).

Redis high availability architecture construction to principle analysis

Environment setup

Since there are only three virtual machines, two redis services need to be built on each server, with ports 6379 and 6380 respectively. This can build just 6 nodes.

IP Master/Slave Node Port Sentinel Port Version
from 6379 26379 5.0.14
IP Master/Slave Node Port Version - 6379 5.0.14
- 6379 5.0.14
##192.168. 36.131
6380 5.0.14

为了看起来不是那么混乱,可以为cluster新建一个文件夹,并将redis的文件拷贝到cluster文件夹中,并修改文件夹名为redis-6379,reids-6380。Redis high availability architecture construction to principle analysis

新建完成后,修改每个节点的redis.conf配置文件,找到cluster相关的配置位置,将cluster-enable更改为yes,表示开启集群模式。开启后,需要修改集群节点连接的超时时间cluster-node-timeout,节点配置文件名cluster-config-file等等,需要注意的是,同一台机器上面的服务节点记得更改端口号。Redis high availability architecture construction to principle analysis

Redis high availability architecture construction to principle analysis

Redis high availability architecture construction to principle analysis

Redis high availability architecture construction to principle analysis


Redis high availability architecture construction to principle analysis

现在虽然每个节点的redis都已经正常启动了,但是每个节点之间并没有任何联系啊。所以这个时候还需要最后一步,将各节点建立关系。在任意一台机器上运行下面的命令-- cluster create ip:port,进行集群创建。命令执行成功后,可以看到槽位的分布情况和主从关系。

./src/redis-cli --cluster create --cluster-replicas 1复制代码
Copy after login

Redis high availability architecture construction to principle analysis


      nodes:,,,,,    sentinel:#      master: mymaster#      nodes:,,
    timeout: 3000
        max-active: 80
        min-idle: 50
Copy after login


Cluster模式下由于存在多个Master节点,所以在存储数据时,需要确定将这个数据存储到哪台机器上。上面在启动集群成功后可以看到每台Master节点都有自己的一个槽位(Slots)范围,Master[0]的槽位范围是0 - 5460,Master[1]的槽位范围是5461 - 10922,Master[2]的槽位范围是10922 - 16383。redis在存储前会通过CRC16方法计算出key的hash值,并与16383进行位运算来确定最终的槽位值。所以,可以知道确定槽位的方式就是 CRC16(key) & 16383。计算出槽位后,此时在java服务端并不知道这个槽位对应到哪一台redis服务,其实在java服务端启动服务时会将redis的相关槽位和映射的ip信息进行一个本地缓存,所以知道槽位后,就会知道对应槽位的ip。


cluster模式中的选举与哨兵中的不同。当某个从节点发现自己的主节点状态变为fail状态时,便尝试进行故障转移。由于挂掉的主节点可能会有多个从节点,从而存在多个从节点竞争成为新主节点 。其选举过程大概如下:

  • 从节点将自己记录的集群currentEpoch加1,并广播FAILOVER_AUTH_REQUEST信息,通知集群中的所有节点,需要进行重新选举了。

  • 其他节点收到该信息,但只有master节点会进行响应,判断请求者的合法性,并发送 FAILOVER_AUTH_ACK,对每一个epoch只发送一次ack。

  • 发送通知的从节点会收集各master主节点返回的FAILOVER_AUTH_ACK。

  • 如果该从节点收到的ack数过半,那么该节点就会被选举为新的Master主节点。成为主节点后,广播通知其他小集群节点



  • 有多个主节点,做到去中心化。

  • 数据可以槽位进行分布存储

  • 扩展性更高,可用性更高。cluster集群中的节点可以在线添加或删除,官方推荐节点数不超1000。当部分Master节点不可用时,整个集群任然可以正常工作。


  • 数据通过异步复制,不保证数据的强一致性

  • Slave节点在集群中充当冷备,不能缓解读压力


reids is a very popular middleware today. It can be used as a cache to reduce the pressure on the DB and improve the performance of the system. It can also be used as a distributed lock to ensure concurrency security. It can also be used as an MQ message queue to reduce the coupling of the system. It supports stand-alone mode, master-slave replication, sentry and cluster mode. Each mode has its own advantages and disadvantages. In actual projects, you can choose according to your own business needs and degree of concurrency.

