데이터 베이스 MySQL 튜토리얼 Redis Cluster and limiting divergences.

Redis Cluster and limiting divergences.

Jun 07, 2016 pm 04:35 PM
and cluster div redis

Redis Cluster is finally on its road to reach the first stable release in a short timeframe as already discussed in the Redis google group [1]. However despite a design never proposed for the implementation of Redis Cluster was analyzed an

Redis Cluster is finally on its road to reach the first stable release in a short timeframe as already discussed in the Redis google group [1]. However despite a design never proposed for the implementation of Redis Cluster was analyzed and discussed at long in the past weeks (unfortunately creating some confusion: many people, including notable personalities of the NoSQL movement, confused the analyzed proposal with Redis Cluster implementation), no attempt was made to analyze or categorize Redis Cluster itself.

I believe that putting in perspective the simple ideas that Redis Cluster implements, in a more formal way, is an interesting exercise for the following reason: Redis Cluster is not a design that tries to achieve “AP” or “CP” of the CAP theorem, since for its goals CAP Availability and CAP Consistency are too hard goals to reach without sacrificing other practical qualities.

Once a design does not try to maximize what is theoretically possible, the design space becomes much larger, and the implementation main goal is to try to provide some Availability and some reasonable form of Consistency, in the face of other conflicting design requirements like asynchronous replication of data.

The goal of this article is to reply to the following question: how Redis Cluster, as an asynchronous system, tries to limit divergences between nodes?

Nodes divergence
===

One of the main problems with asynchronous systems is that a master accepting requests never actually knows in a given moment if it is still the authoritative master or a stale one. For example imagine a cluster of three nodes A, B, C, with one replica each, A1, B1, C1. When a master is partitioned away with some client, A1 may be elected in the other side as the new master, but A is not able, for every request processed, to verify with the other nodes if the request should be accepted or not. At best node A can get asynchronous acknowledges from replicas.

Every time this happens, two parallel time lines for the same set of data is created, one in A, and one in A1. In Redis Cluster there is no way to merge data as explained in [2], so you can imagine the merge function between A and A1 data set when the partition heals as just picking one time line between all the time lines created (two in this case).

Another case that creates different time lines is the replication process itself.

A master A may have three replicas A1, A2, A3. Because of the very concept of asynchronous replication, each of the slaves may represent a point in time in the past history of the time line of A. Usually since replication in Redis is very fast and has minimal delay, as data is transmitted to slaves at the same time as the reply is transmitted to the writing client, the time “delta” between A and the slaves is small. However slaves may be lagging for some reason, so it is possible that, for example, A1 is one second in the past.

Asynchronously replicated nodes can’t really avoid diverging, however there is no need for the divergence between replicas to be unbound. Actually systems allowing replicas to diverge in an uncontrolled way may be hard to use in practice, even when for use case does not requiring strong consistency, that is the target of Redis Cluster.

Putting bounds to divergence
===

Redis cluster uses a few heuristics in order to limit divergence of nodes. The algorithms employed are very simple, consisting merely on the following four rules that nodes follow.

1) When a master is isolated from the majority for node-timeout (that is the user configured time after a non responding node is considered to be failing by the failure detection algorithm), it stops accepting queries from clients. It is easy to see how this helps in practice: in the majority side of the cluster, no slave is able to get elected and replace the master before node-timeout has elapsed. However, after node-timeout time has elapsed, the isolated master knows that it is possible that a parallel history was created in the other side of the cluster, and this history will win over its history, so it stops accepting data that will be otherwise lost. This means that if a master is partitioned away together with one or more clients, the window for data loss is node-timeout, that is usually in the order of 500 — 1000 milliseconds. If the partition heals before node-timeout, no data loss happens as the master rejoins the cluster as master.

2) A related problem is, when a master is failing, how to pick the “best” history among the available histories for the same data (depending on the number of slaves). An algorithm is used in order to give an advantage to the slave with the most updated replication offset to be elected, that is, the slave that likely has the most recent data compared to the master that went down. However if the best slave fails to be elected the other slaves will try an election as well.

3) If you think at “2” you’ll see that actually this means that the divergence between a master and its slaves is unbound. A slave can be lagging for hours in theory. So actually there is another heuristic in use, consisting on slaves to don’t try to get elected at all if the last time they received data from the master is too much in the past. This maximum time is currently set to ten times node-timeout, however it will be user-configurable in the stable release of Redis Cluster. While usually the lag between master and slaves is in the sub-millisecond figure, this time limit ensures that the worst case scenario is the following:

- A slave is stopped/unavailable for just a little less than ten times node-timeout.
- Its master fails.
- At the same time the slave returns back available.

Rejoins went wrong
===

There is another failure mode that was worth covering in Redis Cluster as it is in some way a different instance of the same problems covered so far. What happens if a master rejoins the cluster after it was already failed over, and there are still clients with non updated configuration writing to it?

This may happens in two main ways: rejoining the majority after a partition, and restarting the process. The failure mode is conceptually the same, the master is not able to get a synchronous acknowledge from other replicas about every write, and the other nodes may take a few milliseconds before being able to reconfigure a node that just rejoined (the node is usually reconfigured with an UPDATE message as soon as it is detected to have a stale configuration: this usually happens immediately once the rejoining instance pings another node or sends a pong in reply to a ping).

Rejoins are handled with another heuristic:

4) When a node rejoins the majority, or is restarted, it waits a small time (but yet a few order of magnitudes bigger than the usual network latency), before accepting writes again, in order to maximize the probability to get reconfigured before accepting writes from clients with stale information.

History re-play
===

Some of the Redis data structures and operations are commutative. Obvious examples are INCR and SADD, the order of operations does not matter and eventually the Set or the counter will have the same exact value as long as all the operations are executed.

Because of this observation, and since Redis instances get asynchronous acknowledges from slaves about how much data was processed, it is possible for partitioned masters to remember commands sent by clients that are still not acknowledged by all the replicas.

This trick is able to improve data safety in a way similar to AP systems, but while in AP systems merging values is used, Redis Cluster would reply commands from clients instead.

I proposed this idea in a blog post [3] some time ago, however this is a practical example of what would happen implementing it in a next version of Redis Cluster:

- A master gets partitioned away with clients.
- Clients write to the master for node-timeout time.
- The master starts returning errors.
- In the majority side, a slave is elected as the new master.
- When the old master rejoins it is reconfigured as a replica of the new master.

So far this is the description of what happens currently. With node replying what would happen is that all the writes not acknowledged, from the time the partition is created, to the time the master starts to reply with errors, are accumulated. When the partition heals, as part of turning the old master into a slave, the old master would connect with the new master, and re-play the accumulated stream of commands.

How all this is tested?
===

Some time ago I wrote about what I use in order to test Redis Cluster [4]. The most valuable tool I found so far is a simple consistency-test that is part of the redis-rb-cluster project [5] (a Ruby Redis Cluster client). Basically stress testing the system is as simple as keeping the consistency test running, while simulating different partitions, restarts, and other failures in the cluster.

This test was so useful and is so simple to run, that I’m actually starting to think that everybody running a distributed database in production, whatever it is Cassandra or Zookeeper or Redis or whatever else, should keep a similar test running against the production system as a way to monitor what is happening.

Such tests are lightweight to run, they can be made to just set a few keys per second. However they can easily detect issues with the implementation or other unexpected consistency issues. Especially systems merging data that are at the same time always available, such as AP systems, have the tendency to “mask” bugs: it takes some luck to discover some consistency leak in real data sets. With a simple consistency test running instead it is possible to monitor how the system is behaving.

The following for example is the output of constency-test.rb running against my testing environment:

27523967 R (7187 err) | 27523968 W (7186 err) | 12 noack |

So I know that I read/wrote 27 million of times during my testing, and 12 writes that received no acknowledge were actually materialized inside the database.

Notes:

[1] https://groups.google.com/d/msg/redis-db/2laQRKBKkYg/ssaiQLhasNkJ
[2] http://antirez.com/news/67
[3] http://antirez.com/news/68
[4] http://antirez.com/news/69
[5] https://github.com/antirez/redis-rb-cluster Comments
본 웹사이트의 성명
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.

핫 AI 도구

Undresser.AI Undress

Undresser.AI Undress

사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover

AI Clothes Remover

사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool

Undress AI Tool

무료로 이미지를 벗다

Clothoff.io

Clothoff.io

AI 옷 제거제

Video Face Swap

Video Face Swap

완전히 무료인 AI 얼굴 교환 도구를 사용하여 모든 비디오의 얼굴을 쉽게 바꾸세요!

뜨거운 도구

메모장++7.3.1

메모장++7.3.1

사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전

SublimeText3 중국어 버전

중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기

스튜디오 13.0.1 보내기

강력한 PHP 통합 개발 환경

드림위버 CS6

드림위버 CS6

시각적 웹 개발 도구

SublimeText3 Mac 버전

SublimeText3 Mac 버전

신 수준의 코드 편집 소프트웨어(SublimeText3)

Redis 클러스터 모드를 구축하는 방법 Redis 클러스터 모드를 구축하는 방법 Apr 10, 2025 pm 10:15 PM

Redis Cluster Mode는 Sharding을 통해 Redis 인스턴스를 여러 서버에 배포하여 확장 성 및 가용성을 향상시킵니다. 시공 단계는 다음과 같습니다. 포트가 다른 홀수 redis 인스턴스를 만듭니다. 3 개의 센티넬 인스턴스를 만들고, Redis 인스턴스 및 장애 조치를 모니터링합니다. Sentinel 구성 파일 구성, Redis 인스턴스 정보 및 장애 조치 설정 모니터링 추가; Redis 인스턴스 구성 파일 구성, 클러스터 모드 활성화 및 클러스터 정보 파일 경로를 지정합니다. 각 redis 인스턴스의 정보를 포함하는 Nodes.conf 파일을 작성합니다. 클러스터를 시작하고 Create 명령을 실행하여 클러스터를 작성하고 복제본 수를 지정하십시오. 클러스터에 로그인하여 클러스터 정보 명령을 실행하여 클러스터 상태를 확인하십시오. 만들다

Redis 데이터를 지우는 방법 Redis 데이터를 지우는 방법 Apr 10, 2025 pm 10:06 PM

Redis 데이터를 지우는 방법 : Flushall 명령을 사용하여 모든 키 값을 지우십시오. FlushDB 명령을 사용하여 현재 선택한 데이터베이스의 키 값을 지우십시오. 선택을 사용하여 데이터베이스를 전환 한 다음 FlushDB를 사용하여 여러 데이터베이스를 지우십시오. del 명령을 사용하여 특정 키를 삭제하십시오. Redis-Cli 도구를 사용하여 데이터를 지우십시오.

Redis 대기열을 읽는 방법 Redis 대기열을 읽는 방법 Apr 10, 2025 pm 10:12 PM

Redis의 대기열을 읽으려면 대기열 이름을 얻고 LPOP 명령을 사용하여 요소를 읽고 빈 큐를 처리해야합니다. 특정 단계는 다음과 같습니다. 대기열 이름 가져 오기 : "큐 :"와 같은 "대기열 : my-queue"의 접두사로 이름을 지정하십시오. LPOP 명령을 사용하십시오. 빈 대기열 처리 : 대기열이 비어 있으면 LPOP이 NIL을 반환하고 요소를 읽기 전에 대기열이 존재하는지 확인할 수 있습니다.

Centos redis에서 lua 스크립트 실행 시간을 구성하는 방법 Centos redis에서 lua 스크립트 실행 시간을 구성하는 방법 Apr 14, 2025 pm 02:12 PM

CentOS 시스템에서는 Redis 구성 파일을 수정하거나 Redis 명령을 사용하여 악의적 인 스크립트가 너무 많은 리소스를 소비하지 못하게하여 LUA 스크립트의 실행 시간을 제한 할 수 있습니다. 방법 1 : Redis 구성 파일을 수정하고 Redis 구성 파일을 찾으십시오. Redis 구성 파일은 일반적으로 /etc/redis/redis.conf에 있습니다. 구성 파일 편집 : 텍스트 편집기 (예 : VI 또는 Nano)를 사용하여 구성 파일을 엽니 다. Sudovi/etc/redis/redis.conf LUA 스크립트 실행 시간 제한을 설정 : 구성 파일에서 다음 줄을 추가 또는 수정하여 LUA 스크립트의 최대 실행 시간을 설정하십시오 (Unit : Milliseconds).

Redis 명령 줄을 사용하는 방법 Redis 명령 줄을 사용하는 방법 Apr 10, 2025 pm 10:18 PM

Redis Command Line 도구 (Redis-Cli)를 사용하여 다음 단계를 통해 Redis를 관리하고 작동하십시오. 서버에 연결하고 주소와 포트를 지정하십시오. 명령 이름과 매개 변수를 사용하여 서버에 명령을 보냅니다. 도움말 명령을 사용하여 특정 명령에 대한 도움말 정보를 봅니다. 종금 명령을 사용하여 명령 줄 도구를 종료하십시오.

Redis 카운터를 구현하는 방법 Redis 카운터를 구현하는 방법 Apr 10, 2025 pm 10:21 PM

Redis Counter는 Redis Key-Value Pair 스토리지를 사용하여 다음 단계를 포함하여 계산 작업을 구현하는 메커니즘입니다. 카운터 키 생성, 카운트 증가, 카운트 감소, 카운트 재설정 및 카운트 얻기. Redis 카운터의 장점에는 빠른 속도, 높은 동시성, 내구성 및 단순성 및 사용 편의성이 포함됩니다. 사용자 액세스 계산, 실시간 메트릭 추적, 게임 점수 및 순위 및 주문 처리 계산과 같은 시나리오에서 사용할 수 있습니다.

Redis 만료 정책을 설정하는 방법 Redis 만료 정책을 설정하는 방법 Apr 10, 2025 pm 10:03 PM

REDIS 데이터 만료 전략에는 두 가지 유형이 있습니다. 정기 삭제 : 만료 된 기간 캡-프리브-컨트 컨트 및 만료 된 시간 캡-프레임 딜레이 매개 변수를 통해 설정할 수있는 만료 된 키를 삭제하기위한주기 스캔. LAZY DELETION : 키를 읽거나 쓰는 경우에만 삭제가 만료 된 키를 확인하십시오. 그것들은 게으른 불쾌한 말입니다. 게으른 유발, 게으른 게으른 expire, Lazyfree Lazy-user-del 매개 변수를 통해 설정할 수 있습니다.

Debian Readdir의 성능을 최적화하는 방법 Debian Readdir의 성능을 최적화하는 방법 Apr 13, 2025 am 08:48 AM

Debian Systems에서 ReadDir 시스템 호출은 디렉토리 내용을 읽는 데 사용됩니다. 성능이 좋지 않은 경우 다음과 같은 최적화 전략을 시도해보십시오. 디렉토리 파일 수를 단순화하십시오. 대규모 디렉토리를 가능한 한 여러 소규모 디렉토리로 나누어 읽기마다 처리 된 항목 수를 줄입니다. 디렉토리 컨텐츠 캐싱 활성화 : 캐시 메커니즘을 구축하고 정기적으로 캐시를 업데이트하거나 디렉토리 컨텐츠가 변경 될 때 캐시를 업데이트하며 readDir로 자주 호출을 줄입니다. 메모리 캐시 (예 : Memcached 또는 Redis) 또는 로컬 캐시 (예 : 파일 또는 데이터베이스)를 고려할 수 있습니다. 효율적인 데이터 구조 채택 : 디렉토리 트래버스를 직접 구현하는 경우 디렉토리 정보를 저장하고 액세스하기 위해보다 효율적인 데이터 구조 (예 : 선형 검색 대신 해시 테이블)를 선택하십시오.

See all articles