Table of Contents
How to prevent split-brain in an HA cluster
1. Introduction
2. How to prevent HA cluster split-brain
3. Is the device safe without a fence?
4. Can data be lost after master-slave switching?
5. How to implement the above strategy
6. Reference
Home Backend Development PHP Tutorial How to prevent split-brain in HA cluster_PHP Tutorial

How to prevent split-brain in HA cluster_PHP Tutorial

Jul 12, 2016 am 09:04 AM
android

How to prevent split-brain in an HA cluster

1. Introduction

Split-brain refers to a high-availability (HA) system that when connected When two nodes are disconnected, the system that was originally a whole is split into two independent nodes. At this time, the two nodes begin to compete for shared resources, resulting in system chaos and data damage.

For HA of stateless services, it does not matter whether it is split-brain or not; but for HA of stateful services (such as MySQL), split-brain must be strictly prevented. (But some systems in production environments configure stateful services according to the stateless service HA set, and the results can be imagined...)

2. How to prevent HA cluster split-brain

Generally, 2 methods are used
1. Arbitration
When two nodes disagree, the arbiter of the third party decides who to listen to. This arbiter may be a lock service, a shared disk or something else.
2. fencing
When the status of a node cannot be determined, kill the other node through fencing to ensure that the shared resources are completely released. The premise is that there must be reliable fence equipment.

Ideally, neither of the above should be missing.
However, if the node does not use shared resources, such as database HA based on master-slave replication, we can also safely omit the fence device and only retain the quorum. And many times there are no fence devices available in our environment, such as in cloud hosts.

So can we omit arbitration and only keep the fence device?
No. Because when two nodes lose contact with each other, they will fencing each other at the same time. If the fencing method is reboot, then the two machines will restart continuously. If the fencing method is power off, then the outcome may be that two nodes die together, or one may survive. But if the reason why two nodes lose contact with each other is that one of the nodes has a network card failure, and the one that survives happens to be the faulty node, then the ending will be tragic.

So, a simple double node cannot prevent split-brain anyway.

3. Is the device safe without a fence?

Take the data replication of PostgreSQL or MySQL as an example to illustrate this issue.
In a replication-based scenario, the master-slave nodes do not share resources, so there is no problem if both nodes are alive. The question is whether the client will access the node that is supposed to be dead. This again involves the issue of client routing.

There are several methods for client routing, based on VIP, based on Proxy, based on DNS or simply the client maintains a list of server addresses to determine the master and slave by itself. No matter which method is used, the routing must be updated when the master-slave switches.

Routing based on DNS is not reliable because DNS may be cached by the client and is difficult to clear.

VIP-based routing has some variables. If the node that is supposed to die does not remove its VIP, it may come out to cause trouble at any time (even if the new owner has updated the arp cache on all hosts through arping , if the arp of a certain host expires and an arp query is sent, an ip conflict will occur). Therefore, it can be considered that VIP is also a special shared resource and must be removed from the faulty node. As for how to pick it, the simplest way is to pick it by itself after the faulty node discovers that it has lost contact, if it is still alive (if it is dead, there is no need to pick it). What if the process responsible for extracting VIP cannot work? At this time, you can use soft fence devices that are not reliable (such as ssh).

Proxy-based routing is more reliable, because Proxy is the only service entrance. As long as the Proxy is updated in one place, the problem of client misaccess will not occur, but Proxy must also be considered. High availability.

As for the method based on the server address list, the client needs to determine the master and slave through the background service (such as whether the PostgreSQL/MySQL session is in read-only mode). At this time, if there are two masters, the client will be confused. In order to prevent this problem, the original master node must stop the service by itself after discovering that it has lost contact. This is the same as the previous VIP removal.

Therefore, in order to prevent the faulty node from causing trouble, the faulty node should release the resources by itself after losing contact. In order to cope with the failure of the process that releases the resources, a soft fence can be added. Under this premise, it can be considered that it is safe without reliable physical fence equipment.

4. Can data be lost after master-slave switching?

Whether data will be lost after master-slave switching and brain splitting can be considered two different issues. Also take the data replication of PostgreSQL or MySQL as an example to illustrate.

For PostgreSQL, if configured for synchronous streaming replication, no data will be lost regardless of whether the routing is correct. Because the client routed to the wrong node cannot write any data at all, it will always wait for feedback from the slave node, and the slave node it thought was now the master, of course, will ignore it. Of course, it is not good if this happens all the time, but it provides sufficient time for the cluster monitoring software to correct routing errors.

For MySQL, even if it is configured for semi-synchronous replication, it may automatically downgrade to asynchronous replication after a timeout occurs. In order to prevent MySQL replication from being degraded, you can set an extremely large rpl_semi_sync_master_timeout while keeping rpl_semi_sync_master_wait_no_slave on (the default value). However, if the slave fails at this time, the master will also stop. The solution to this problem is the same as PostgreSQL, either configuring it as 1 master and 2 slaves, as long as both slaves are not down, it will be fine, or using external cluster monitoring software to dynamically switch between semi-synchronous and asynchronous.
If it is originally configured asynchronous replication, it means that you are ready to lose data. At this time, it’s not a big deal to lose some data when switching between master and slave, but the number of automatic switches must be controlled. For example, the original owner whose control has been failed over is not allowed to go online automatically. Otherwise, if failover occurs due to network jitter, the master and slave will keep switching back and forth, losing data, and destroying data consistency.

5. How to implement the above strategy

You can implement a script that conforms to the above logic from scratch. But I prefer to build it based on mature cluster software, such as Pacemaker Corosync and appropriate resource agents. I highly do not recommend Keepalived. It is not suitable for HA of stateful services. Even if you add arbitration and fences to the solution, it always feels awkward.

There are also some precautions when using Pacemaker Corosync
1) Understand the functions and principles of Resource Agent
Only by understanding the functions and principles of Resource Agent can you know the scenarios it is applicable to. For example, the resource agent of pgsql is relatively complete, supports synchronous and asynchronous stream replication, and can automatically switch between the two, and can ensure that data will not be lost during synchronous replication. But the current resource agent of MySQL is very weak. Without GTID and without log compensation, it is easy to lose data. It is better not to use it and continue to use MHA (but be sure to guard against split-brain when deploying MHA).

2) Ensure the quorum (quorum)
Quorum can be considered as Pacemkaer’s own arbitration mechanism. A majority of all nodes in the cluster elects a coordinator, and all instructions in the cluster are controlled by this coordinator. Issued, it can perfectly eliminate the problem of split brain. In order for this mechanism to work effectively, there must be at least 3 nodes in the cluster, and no-quorum-policy is set to stop, which is also the default value. (Many tutorials set no-quorum-policy to ignore for the convenience of demonstration. If the production environment does this and there is no other arbitration mechanism, it is very dangerous!)

However, if there are only 2 nodes what to do?
The first is to borrow a machine to gather 3 nodes, and then set location restrictions to prevent resources from being allocated to that node.
The second is to pull together multiple small clusters that do not meet the quorum to form a large cluster. Location restrictions are also applied to control the location of resource allocation.

But if you have many two-node clusters, you can’t find so many nodes to make up the number, and you don’t want to pull these two-node clusters together to form a large cluster (for example, you find it inconvenient to manage). Then you can consider the third method.
The third method is to configure a preempted resource, as well as services and colocation constraints of this preempted resource. Whoever seizes the preempted resource will provide the service. This preempted resource can be a lock service, such as one packaged based on zookeeper, or simply make one from scratch, like the following example.
http://my.oschina.net/hanhanztj/blog/515065
(This example is a short connection based on the http protocol. A more detailed approach is to use long connection heartbeat detection so that the server can detect it in time The lock is released when the connection is disconnected)
However, you must also ensure the high availability of this preempted resource. You can make the service that provides preempted resources into lingyig high availability, or you can be simpler and deploy 3 services on dual nodes. One is deployed first, and the third one is deployed on another dedicated arbitration node. The lock is considered to be acquired when at least 2 of the 3 locks are obtained. This quorum node can provide quorum services for many clusters (because a machine can only deploy one Pacemaker instance, otherwise you can use an arbiter node with N Pacemaker instances deployed to do the same thing.). However, if you have no last resort, try to use the previous method, that is, to meet the Pacemaker's statutory number of votes. This method is simpler and more reliable.

6. Reference

http://blog.chinaunix.net/uid-20726500-id-4461367.html
http://my.oschina.net/hanhanztj/blog /515065
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html
http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
http://mysqllover.com/?p=799
http://gmt-24.net/archives/1077

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/1073479.htmlTechArticleHow to prevent split-brain in an HA cluster 1. Introduction Split-brain refers to a high availability ( In HA) system, when the two connected nodes are disconnected, the system as a whole...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

New report delivers damning assessment of rumoured Samsung Galaxy S25, Galaxy S25 Plus and Galaxy S25 Ultra camera upgrades New report delivers damning assessment of rumoured Samsung Galaxy S25, Galaxy S25 Plus and Galaxy S25 Ultra camera upgrades Sep 12, 2024 pm 12:23 PM

In recent days, Ice Universe has been steadily revealing details about the Galaxy S25 Ultra, which is widely believed to be Samsung's next flagship smartphone. Among other things, the leaker claimed that Samsung only plans to bring one camera upgrade

Samsung Galaxy S25 Ultra leaks in first render images with rumoured design changes revealed Samsung Galaxy S25 Ultra leaks in first render images with rumoured design changes revealed Sep 11, 2024 am 06:37 AM

OnLeaks has now partnered with Android Headlines to provide a first look at the Galaxy S25 Ultra, a few days after a failed attempt to generate upwards of $4,000 from his X (formerly Twitter) followers. For context, the render images embedded below h

IFA 2024 | TCL\'s NXTPAPER 14 won\'t match the Galaxy Tab S10 Ultra in performance, but it nearly matches it in size IFA 2024 | TCL\'s NXTPAPER 14 won\'t match the Galaxy Tab S10 Ultra in performance, but it nearly matches it in size Sep 07, 2024 am 06:35 AM

Alongside announcing two new smartphones, TCL has also announced a new Android tablet called the NXTPAPER 14, and its massive screen size is one of its selling points. The NXTPAPER 14 features version 3.0 of TCL's signature brand of matte LCD panels

Vivo Y300 Pro packs 6,500 mAh battery in a slim 7.69 mm body Vivo Y300 Pro packs 6,500 mAh battery in a slim 7.69 mm body Sep 07, 2024 am 06:39 AM

The Vivo Y300 Pro just got fully revealed, and it's one of the slimmest mid-range Android phones with a large battery. To be exact, the smartphone is only 7.69 mm thick but features a 6,500 mAh battery. This is the same capacity as the recently launc

Samsung Galaxy S24 FE billed to launch for less than expected in four colours and two memory options Samsung Galaxy S24 FE billed to launch for less than expected in four colours and two memory options Sep 12, 2024 pm 09:21 PM

Samsung has not offered any hints yet about when it will update its Fan Edition (FE) smartphone series. As it stands, the Galaxy S23 FE remains the company's most recent edition, having been presented at the start of October 2023. However, plenty of

New report delivers damning assessment of rumoured Samsung Galaxy S25, Galaxy S25 Plus and Galaxy S25 Ultra camera upgrades New report delivers damning assessment of rumoured Samsung Galaxy S25, Galaxy S25 Plus and Galaxy S25 Ultra camera upgrades Sep 12, 2024 pm 12:22 PM

In recent days, Ice Universe has been steadily revealing details about the Galaxy S25 Ultra, which is widely believed to be Samsung's next flagship smartphone. Among other things, the leaker claimed that Samsung only plans to bring one camera upgrade

Xiaomi Redmi Note 14 Pro Plus arrives as first Qualcomm Snapdragon 7s Gen 3 smartphone with Light Hunter 800 camera Xiaomi Redmi Note 14 Pro Plus arrives as first Qualcomm Snapdragon 7s Gen 3 smartphone with Light Hunter 800 camera Sep 27, 2024 am 06:23 AM

The Redmi Note 14 Pro Plus is now official as a direct successor to last year'sRedmi Note 13 Pro Plus(curr. $375 on Amazon). As expected, the Redmi Note 14 Pro Plus heads up the Redmi Note 14 series alongside theRedmi Note 14and Redmi Note 14 Pro. Li

iQOO Z9 Turbo Plus: Reservations begin for the potentially beefed-up series flagship iQOO Z9 Turbo Plus: Reservations begin for the potentially beefed-up series flagship Sep 10, 2024 am 06:45 AM

OnePlus'sister brand iQOO has a 2023-4 product cycle that might be nearlyover; nevertheless, the brand has declared that it is not done with itsZ9series just yet. Its final, and possibly highest-end,Turbo+variant has just beenannouncedas predicted. T

See all articles