데이터 베이스 MySQL 튜토리얼 A few arguments about Redis Sentinel properties an

A few arguments about Redis Sentinel properties an

Jun 07, 2016 pm 04:39 PM
redis

Yesterday distributed systems expert Aphyr, posted a tweet about a Redis Sentinel issue experienced by an unknown company (that wishes to remain anonymous): “OH on Redis Sentinel "They kill -9'd the master, which caused a split brain..."

Yesterday distributed systems expert Aphyr, posted a tweet about a Redis Sentinel issue experienced by an unknown company (that wishes to remain anonymous):

“OH on Redis Sentinel "They kill -9'd the master, which caused a split brain..."
“then the old master popped up with no data and replicated the lack of data to all the other nodes. Literally had to restore from backups."

OMG we have some nasty bug I thought. However I tried to get more information from Kyle, and he replied that the users actually disabled disk persistence at all from the master process. Yep: the master was configured on purpose to restart with a wiped data set.

Guess what? A Twitter drama immediately started. People were deeply worried for Redis users. Poor Redis users! Always in danger.

However while to be very worried is a trait of sure wisdom, I want to take the other path: providing some more information. Moreover this blog post is interesting to write since actually Kyle, while reporting the problem with little context, a few tweets later was able to, IMHO, isolate what is the true aspect that I believe could be improved in Redis Sentinel, which is not related to the described incident, and is in my TODO list for a long time now.

But before, let’s check a bit more closely the behavior of Redis / Sentinel about the drama-incident.

Welcome to the crash recovery system model
===

Most real world distributed systems must be designed to be resilient to the fact that processes can restart at random. Note that this is very different from the problem of being partitioned away, which is, the inability to exchange messages with other processes. It is, instead, a matter of losing state.

To be more accurate about this problem, we could say that if a distributed algorithm is designed so that a process must guarantee to preserve the state after a restart, and fails to do this, it is technically experiencing a bizantine failure: the state is corrupted, and the process is no longer reliable.

Now in a distributed system composed of Redis instances, and Redis Sentinel instances, it is fundamental that rebooted instances are able to restart with the old data set. Starting with a wiped data set is a byzantine failure, and Redis Sentinel is not able to recover from this problem.

But let’s do a step backward. Actually Redis Sentinel may not be directly involved in an incident like that. The typical example is what happens if a misconfigured master restarts fast enough so that no failure is detected at all by Sentinels.

1. Node A is the master.
2. Node A is restarted, with persistence disabled.
3. Sentinels (may) see that Node A is not reachable… but not enough to reach the configured timeout.
4. Node A is available again, except it restarted with a totally empty data set.
5. All the slave nodes B, C, D, ... will happily synchronize an empty data set form it.

Everything wiped from the master, as per configuration, after all. And everything wiped from the slaves, that are replicating from what is believed to be the current source of truth for the data set.

Let’s remove Sentinel from the equation, which is, point “3” of the above time line, since Sentinel did not acted at all in the example scenario.

This is what you get. You have a Redis master replicating with N slaves. The master is restarted, configured to start with a fresh (empty) data set. Salves replicate again from it (an empty data set).
I think this is not a big news for Redis users, this is how Redis replication works: slaves will always try to be the exact copy of their masters. However let’s consider alternative models.

For example Redis instances could have a Node ID which is persisted in the RDB / AOF file. Every time the node restarts, it loads its Node ID. If the Node ID is wrong, slaves wont replicate from the master at all. Much safer right? Only marginally, actually. The master could have a different misconfiguration, so after a restart, it could reload a data set which is weeks old since snapshotting failed for some reason.
So after a bad restart, we still have the right Node ID, but the dataset is so old to be basically, the same as being wiped more or less, just more subtle do detect.

However at the cost of making things only marginally more secure we now have a system that may be more complex to operate, and slaves that are in danger of not replicating from the master since the ID does not match, because of operational errors similar to disabling persistence, except, a lot less obvious than that.

So, let’s change topic, and see a failure mode where Sentinel is *actually* involved, and that can be improved.

Not all replicas are the same
===

Technically Redis Sentinel offers a very limited set of simple to understand guarantees.

1) All the Sentinels will agree about the configuration as soon as they can communicate. Actually each sub-partition will always agree.
2) Sentinels can’t start a failover without an authorization from the majority of Sentinel processes.
3) Failovers are strictly ordered: if a failover happened later in time, it has a greater configuration “number” (config epoch in Sentinel slang), that will always win over older configurations.
4) Eventually the Redis instances are configured to map with the winning logical configuration (the one with the greater config epoch).

This means that the dataset semantics is “last failover wins”. However the missing information here is, during a failover, what slave is picked to replace the master? This is, all in all, a fundamental property. For example if Redis Sentinel fails by picking a wiped slave (that just restarted with a wrong configuration), *that* is a problem with Sentinel. Sentinel should make sure that, even within the limits of the fact that Redis is an asynchronously replicated system, it will try to make the best interest of the user by picking the best slave around, and refusing to failover at all if there is no viable slave reachable.

This is a place where improvements are possible, and this is what happens today to select a slave when the master is failing:

1) If a slaves was restarted, and never was connected with the master after the restart, performing a successful synchronization (data transfer), it is skipped.
2) If the slave is disconnected from its master for more than 10 times the configured timeout (the time a master should be not reachable for the set of Sentinels to detect a master as failing), it is considered to be non elegible.
3) Out of the remaining slaves, Sentinel picks the one with the best “replication offset”.

The replication offset is a number that Redis master-slave replication uses to take a count of the amount of bytes sent via the replication channel. it is useful in many ways, not just for failovers. For example in partial resynchronizations after a net split, slaves will ask the master, give me data starting from offset X, which is the last byte I received, and so forth.

However this replication number has two issues when used in the context of picking the best slave to promote.

1) It is reset after restarts. This sounds harmless at first, since we want to pick slaves with the higher number, and anyway, after a restart if a slave can’t connect, it is skipped. However it is not harmless at all, read more.
2) It is just a number: it does not imply that a Redis slave replicated from *a given* master.

Also note that when a slave is promoted to master, it inherits the master’s replication offset. So modulo restarts, the number keeps increasing.

Why “1” and/or “2” are suboptimal choices and can be improved?

Imagine this setup. We have nodes A B C D E.
D is the current master, and is partitioned away with E in a minority partition. E still replicates from D, everything is fine from their POV.

However in the majority partition, A B C can exchange messages, and A is elected master.
Later A restarts, resetting its offset. B and C replicate from it, starting again with lower offsets.
After some time A fails, and, at the same time, E rejoins the majority partition.

E has a data set that is less updated compared to the B and C data set, however its replication offset is higher. Not just that, actually E can claim it was recently connected to its master.

To improve upon this is easy. Each Redis instance has a “runid”, an unique ID that changes for each new Redis run. This is useful in partial resynchronizations in order to avoid getting an incremental stream from a wrong master. Slaves should publish what is the last master run id they replicated successful from, and Sentinel failover should make sure to only pick slaves that replicated from the master they are actually failing over.

Once you tight the replication offset to a given runid, what you get is an *absolute* meter of how much updated a slave is. If two slaves are available, and both can claim continuity with the old master, the one with the higher replication offset is guaranteed to be the best pick.

However this also creates availability concerns in all the cases where data is not very important but availability is. For example if when A crashes, only E becomes available, even if it used to replicate from D, it is still better than nothing. I would say that when you need an highly available cache and consistency is not a big issue, to use a Redis cluster ala-memcached (client side consistent hashing among N masters) is the way to go.

Note that even without checking the runid, to make the replication offsets durable after a restart, already improves the behavior considerably. In the above example E would be picked only if when isolated in a minority partition with the slave, received more writes than the other slaves in the majority side.

TLDR: we have to fix this. It is not related to restarting masters without a dataset, but is useful to have a more correct implementation. However this will limit only a class of very-hard-to-trigger issues.

This is in my TODO list for some time now, and congrats to Aphyr for identifying a real implementation issue in a matter of a few tweets exchanged. About the failure reported by Aphyr from the unknown company, I don’t think it is currently viable to try to protect against serious misconfigurations, however it is a clear sign that we need better Sentinel docs which are more incremental compared to the ones we have now, that try to describe how the system works. A wiser approach could be to start with a common sane configuration, and “don’t do” list like, don’t turn persistence off, unless you are ok with wiped instances. Comments
본 웹사이트의 성명
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.

핫 AI 도구

Undresser.AI Undress

Undresser.AI Undress

사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover

AI Clothes Remover

사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool

Undress AI Tool

무료로 이미지를 벗다

Clothoff.io

Clothoff.io

AI 옷 제거제

AI Hentai Generator

AI Hentai Generator

AI Hentai를 무료로 생성하십시오.

인기 기사

R.E.P.O. 에너지 결정과 그들이하는 일 (노란색 크리스탈)
3 몇 주 전 By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. 최고의 그래픽 설정
3 몇 주 전 By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. 아무도들을 수없는 경우 오디오를 수정하는 방법
3 몇 주 전 By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25 : Myrise에서 모든 것을 잠금 해제하는 방법
3 몇 주 전 By 尊渡假赌尊渡假赌尊渡假赌

뜨거운 도구

메모장++7.3.1

메모장++7.3.1

사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전

SublimeText3 중국어 버전

중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기

스튜디오 13.0.1 보내기

강력한 PHP 통합 개발 환경

드림위버 CS6

드림위버 CS6

시각적 웹 개발 도구

SublimeText3 Mac 버전

SublimeText3 Mac 버전

신 수준의 코드 편집 소프트웨어(SublimeText3)

Redis 클러스터 모드를 구축하는 방법 Redis 클러스터 모드를 구축하는 방법 Apr 10, 2025 pm 10:15 PM

Redis Cluster Mode는 Sharding을 통해 Redis 인스턴스를 여러 서버에 배포하여 확장 성 및 가용성을 향상시킵니다. 시공 단계는 다음과 같습니다. 포트가 다른 홀수 redis 인스턴스를 만듭니다. 3 개의 센티넬 인스턴스를 만들고, Redis 인스턴스 및 장애 조치를 모니터링합니다. Sentinel 구성 파일 구성, Redis 인스턴스 정보 및 장애 조치 설정 모니터링 추가; Redis 인스턴스 구성 파일 구성, 클러스터 모드 활성화 및 클러스터 정보 파일 경로를 지정합니다. 각 redis 인스턴스의 정보를 포함하는 Nodes.conf 파일을 작성합니다. 클러스터를 시작하고 Create 명령을 실행하여 클러스터를 작성하고 복제본 수를 지정하십시오. 클러스터에 로그인하여 클러스터 정보 명령을 실행하여 클러스터 상태를 확인하십시오. 만들다

기본 Redis를 구현하는 방법 기본 Redis를 구현하는 방법 Apr 10, 2025 pm 07:21 PM

Redis는 해시 테이블을 사용하여 데이터를 저장하고 문자열, 목록, 해시 테이블, 컬렉션 및 주문한 컬렉션과 같은 데이터 구조를 지원합니다. Redis는 Snapshots (RDB)를 통해 데이터를 유지하고 WRITE 전용 (AOF) 메커니즘을 추가합니다. Redis는 마스터 슬레이브 복제를 사용하여 데이터 가용성을 향상시킵니다. Redis는 단일 스레드 이벤트 루프를 사용하여 연결 및 명령을 처리하여 데이터 원자력과 일관성을 보장합니다. Redis는 키의 만료 시간을 설정하고 게으른 삭제 메커니즘을 사용하여 만료 키를 삭제합니다.

Redis 클러스터는 어떻게 구현됩니까? Redis 클러스터는 어떻게 구현됩니까? Apr 10, 2025 pm 05:27 PM

Redis Cluster는 Redis 인스턴스의 수평 확장을 허용하는 분산 배포 모델이며, 노드 간 통신, 해시 슬롯 디비전 키 공간, 노드 선거, 마스터 슬레이브 복제 및 명령 리디렉션을 통해 구현됩니다. 노드 간 통신 : 가상 네트워크 통신은 클러스터 버스를 통해 실현됩니다. 해시 슬롯 : 키 공간을 해시 슬롯으로 나누어 키를 담당하는 노드를 결정합니다. 노드 선거 : 최소 3 개의 마스터 노드가 필요하며 선거 메커니즘을 통해 하나의 활성 마스터 노드 만 보장됩니다. 마스터 슬레이브 복제 : 마스터 노드는 요청을 작성하고 슬레이브 노드는 요청 및 데이터 복제를 담당합니다. 명령 리디렉션 : 클라이언트는 키를 담당하는 노드에 연결하고 노드는 잘못된 요청을 리디렉션합니다. 문제 해결 : 결함 감지, 라인 마킹 및 재

Redis-Server를 찾을 수없는 경우해야 할 일 Redis-Server를 찾을 수없는 경우해야 할 일 Apr 10, 2025 pm 06:54 PM

Redis-Server가 찾을 수없는 문제를 해결하기위한 단계 : Redis가 올바르게 설치되어 있는지 확인하십시오. 환경 변수를 설정 redis_host 및 redis_port; Redis Server Redis-Server를 시작하십시오. 서버가 Redis-Cli Ping을 실행 중인지 확인하십시오.

Redis의 버전 번호를 보는 방법 Redis의 버전 번호를 보는 방법 Apr 10, 2025 pm 05:57 PM

Redis 버전 번호를 보려면 다음 세 가지 방법을 사용할 수 있습니다. (1) info 명령을 입력하고 (2) -version 옵션으로 서버를 시작하고 (3) 구성 파일을 봅니다.

Redis에서 모든 키를 보는 방법 Redis에서 모든 키를 보는 방법 Apr 10, 2025 pm 07:15 PM

Redis에서 모든 키를 보려면 세 가지 방법이 있습니다. 키 명령을 사용하여 지정된 패턴과 일치하는 모든 키를 반환하십시오. 스캔 명령을 사용하여 키를 반복하고 키 세트를 반환하십시오. 정보 명령을 사용하여 총 키 수를 얻으십시오.

Redis Zset을 사용하는 방법 Redis Zset을 사용하는 방법 Apr 10, 2025 pm 07:27 PM

Redis 순서 세트 (ZSETS)는 순서가있는 요소를 저장하고 관련 점수별로 정렬하는 데 사용됩니다. ZSET을 사용하는 단계에는 다음이 포함됩니다. 1. ZSET을 만듭니다. 2. 회원 추가; 3. 회원 점수를 얻으십시오. 4. 순위를 얻으십시오. 5. 순위 범위에서 멤버를 받으십시오. 6. 회원 삭제; 7. 요소 수를 얻으십시오. 8. 점수 범위에서 멤버 수를 얻으십시오.

Key가 Redis 쿼리의 고유 한 방법은 어떻습니까? Key가 Redis 쿼리의 고유 한 방법은 어떻습니까? Apr 10, 2025 pm 07:03 PM

Redis는 키의 독창성을 보장하기 위해 5 가지 전략을 사용합니다. 1. 네임 스페이스 분리; 2. 해시 데이터 구조; 3. 데이터 구조 설정; 4. 문자열 키의 특수 문자; 5. LUA 스크립트 확인. 특정 전략의 선택은 데이터 구성, 성능 및 확장 성 요구 사항에 따라 다릅니다.

See all articles