Oracle Grid Infrastructure: Understanding Split
APPLIES TO: Oracle Database - Enterprise Edition - Version 11.2.0.1 and later Information in this document applies to any platform. PURPOSE The purpose of this note is to explain split-brain node evictions in Oracle Clusterware release 11.
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and laterInformation in this document applies to any platform.
PURPOSE
The purpose of this note is to explain split-brain node evictions in Oracle Clusterware release 11.2
SCOPE
The intended audience of this note is Oracle Clusterware 11.2 administrators at any level of expertise. As written, this note applies only to 11.2.
DETAILS
Missed network heartbeat (NHB) evictions happen when ocssd of the surviving node loses contact with the evicted node over the interconnect. The nodes must be able to communicate over the interconnect to avoid a "split brain" situation. In the case of a "split brain" node eviction, one node aborted itself to avoid "split brain" when communication over the interconnect was compromised.
What does "split brain" mean?
"Split brain" means that there are 2 or more distinct sets of nodes, or "cohorts", with no communication between the two cohorts.
For example:
Suppose there are 4 nodes named A, B, C, D, in the following situation
* Nodes A,B can talk to each other; nodes C,D can talk to each other
* But A and B cannot talk to C or D, and vice versa
Then there are two cohorts: {A, B} and {C, D}.
Why is this a problem?
In a split-brain situation, there are in a sense two (or more) separate clusters working on the same shared storage. This has the potential for data corruption. So the split-brain must be resolved.
How does the clusterware resolve a "split brain" situation?
Oracle Clusterware handles the split-brain by terminating all the nodes in the SMALLER cohort.
If both of the cohorts are the same size, the cohort with the lowest numbered node in it survives.
The clusterware identifies the LARGEST cohort, and aborts all the nodes which do NOT belong to that cohort.
Identifying a split-brain eviction
In a split-brain node eviction, the following message is present in the ocssd log ($GRID_HOME/log/
clssnmCheckDskInfo: Aborting local node to avoid splitbrain.
And earlier in the same log, within 10 minutes prior to "clssnmCheckDskInfo: Aborting local node" message:
clssnmPollingThread: node %s (%n) at
Finding the cohort
The split-brain message in the ocssd.log will show "cohort" information. For example:
2012-12-28 20:26:25.803: [ CSSD][1111296320]clssnmCheckDskInfo: My cohort: 1
2012-12-28 20:26:25.803: [ CSSD][1111296320]clssnmCheckDskInfo: Surviving cohort: 2,3,4
2012-12-28 20:26:25.803: [ CSSD][1111296320](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 1, sprora01, is smaller than cohort of 3 nodes led by node 2, sprora02, based on map type 2
Understanding the cohort message
In a split-brain situation, ocssd on each node records on the voting disk the set of nodes it can communicate with. Each set is known as a "cohort". When there are two (or more) mutually
non-intersecting sets, we have a "split-brain" situation. It means that there are two (or more) separate sets of nodes which cannot talk to each other over the interconnect.
For example, in the above quote
My cohort: 1
Surviving cohort: 2,3,4
The meaning of these messages is
* "My cohort: 1" => The list of nodes I can communicate with: 1
* "Surviving cohort: 2,3,4" => From the voting disk, I know that nodes 2,3,4 can all communicate with each other.
* "Cohort of 1 nodes with leader 1, sprora01, is smaller than cohort of 3 nodes led by node 2, sprora02"
=> Oracle Clusterware has identified that the cohort {1} is smaller than the cohort {2,3,4}.
Oracle Clusterware handles the split-brain by terminating all the nodes in the SMALLER cohort. In this case, the smaller cohort is {1}. Therefore, ocssd on node {1} aborts the node.
Using the cohort message to identify interconnect network issues
The cohort message describes which nodes can communicate with each other.
Each cohort is a set of nodes that can talk to each other, and cannot talk to the nodes NOT in the cohort.
In the above example, the cohort message tells us that nodes {2,3,4} are all in communication; node 1 is not in communication with any of them.
Follow-up Action
The private network between node 1 and the other 3 nodes should be checked.
Please refer to the following note to check private interconnect network: Document 1534949.1 - Oracle Grid Infrastructure: How to Troubleshoot
Missed Network Heartbeat Evictions
===================================================
在理解脑裂(Brain Split)处理过程前,有必要介绍一下Oracle RAC Css(Cluster Synchronization Services)的工作框架:
Oracle RAC CSS提供2种后台服务包括群组管理(Group Managment简称GM)和节点监控(Node Monitor简称NM),其中GM管理组(group)和锁(lock)服务。在集群中任意时刻总有一个节点会充当GM主控节点(master node)。集群中的其他节点串行地将GM请求发送到主控节点(master node),而master node将集群成员变更信息广播给集群中的其他节点。组成员关系(group membership)在每次发生集群重置(cluster reconfiguration)时发生同步。每一个节点独立地诠释集群成员变化信息。
而节点监控NM服务则负责通过skgxn(skgxn-libskgxn.a,提供节点监控的库)与其他厂商的集群软件保持节点信息的一致性。此外NM还提供对我们熟知的网络心跳(Network heartbeat)和磁盘心跳(Disk heartbeat)的维护以保证节点始终存活着。当集群成员没有正常Network heartbeat或Disk heartbeat时NM负责将成员踢出集群,被踢出集群的节点将发生节点重启(reboot)。
NM服务通过OCR中的记录(OCR中记录了Interconnect的信息)来了解其所需要监听和交互的端点,将心跳信息通过网络发送到其他集群成员。同时它也监控来自所有其他集群成员的网络心跳Network heartbeat,每一秒钟都会发生这样的网络心跳,若某个节点的网络心跳在misscount(by the way:10.2.0.1中Linux上默认misscount为60s,其他平台为30s,若使用了第三方vendor clusterware则为600s,但10.2.0.1中未引入disktimeout;10.2.0.4以后misscount为60s,disktimeout为200s;11.2以后misscount为30s:CRS-4678: Successful get misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services)指定的秒数中都没有被收到的话,该节点被认为已经”死亡”了。NM还负责当其他节点加入或离开集群时初始化集群的重置(Initiates cluster reconfiguration)。
在解决脑裂的场景中,NM还会监控voting disk以了解其他的竞争子集群(subclusters)。关于子集群我们有必要介绍一下,试想我们的环境中存在大量的节点,以Oracle官方构建过的128个节点的环境为我们的想象空间,当网络故障发生时存在多种的可能性,一种可能性是全局的网络失败,即128个节点中每个节点都不能互相发生网络心跳,此时会产生多达128个的信息”孤岛”子集群。另一种可能性是局部的网络失败,128个节点中被分成多个部分,每个部分中包含多于一个的节点,这些部分就可以被称作子集群(subclusters)。当出现网络故障时子集群内部的多个节点仍能互相通信传输投票信息(vote mesg),但子集群或者孤岛节点之间已经无法通过常规的Interconnect网络交流了,这个时候NM Reconfiguration就需要用到voting disk投票磁盘。
因为NM要使用voting disk来解决因为网络故障造成的通信障碍,所以需要保证voting disk在任意时刻都可以被正常访问。在正常状态下,每个节点都会进行磁盘心跳活动,具体来说就是会到投票磁盘的某个块上写入disk心跳信息,这种活动每一秒钟都会发生,同时CSS还会每秒读取一种称作”kill block”的”赐死块”,当”kill block”的内容表示本节点被驱逐出集群时,CSS会主动重启节点。
为了保证以上的磁盘心跳和读取”kill block”的活动始终正常运作CSS要求保证至少(N/2+1)个投票磁盘要被节点正常访问,这样就保证了每2个节点间总是至少有一个投票磁盘是它们都可以正常访问的,在正常情况下(注意是风平浪静的正常情况)只要节点所能访问的在线voting disk多于无法访问的voting disk,该节点都能幸福地活下去,当无法访问的voting disk多于正常的voting disk时,Cluster Communication Service进程将失败并引起节点重启。所以有一种说法认为voting disk只要有2个足以保证冗余度就可以了,没有必要有3个或以上voting disk,这种说法是错误的。Oracle推荐集群中至少要有3个voting disks。

热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

AI Hentai Generator
免费生成ai无尽的。

热门文章

热工具

记事本++7.3.1
好用且免费的代码编辑器

SublimeText3汉化版
中文版,非常好用

禅工作室 13.0.1
功能强大的PHP集成开发环境

Dreamweaver CS6
视觉化网页开发工具

SublimeText3 Mac版
神级代码编辑软件(SublimeText3)

热门话题

Oracle 环境变量配置指南:创建 ORACLE_HOME 环境变量,指向 Oracle 主目录。将 Oracle 二进制文件目录添加到 PATH 环境变量。设置 TNS_ADMIN 环境变量(如果使用 TNS 命名文件)。验证环境变量设置,确保输出显示已设置的变量。

Oracle 提供多种去重查询方法:DISTINCT 关键字返回每列的唯一值。GROUP BY 子句对结果分组并返回每个分组的非重复值。UNIQUE 关键字用于创建仅包含唯一行的索引,查询该索引将自动去重。ROW_NUMBER() 函数分配唯一数字并过滤出仅包含第 1 行的结果。MIN() 或 MAX() 函数可返回数字列的非重复值。INTERSECT 运算符返回两个结果集的公共值(无重复项)。

Oracle 乱码问题可以通过以下步骤解决:检查数据库字符集以确保与数据相匹配。设置客户端字符集以与数据库相匹配。转换数据或修改列字符集以匹配数据库字符集。使用 Unicode 字符集,并避免多字节字符集。检查数据库和客户端的语言设置是否正确。

存储过程是一组可存储在数据库中的 SQL 语句,可作为独立单元重复调用。它们可以接受参数(IN、OUT、INOUT),并提供代码重用、安全性、性能和模块化的优势。示例:创建存储过程 calculate_sum 来计算两个数字的总和并将其存储在 OUT 参数中。

要查询 Oracle 表空间大小,请遵循以下步骤:确定表空间名称,方法是运行查询:SELECT tablespace_name FROM dba_tablespaces;查询表空间大小,方法是运行查询:SELECT sum(bytes) AS total_size, sum(bytes_free) AS available_space, sum(bytes) - sum(bytes_free) AS used_space FROM dba_data_files WHERE tablespace_

使用 ALTER TABLE 语句,具体语法如下:ALTER TABLE table_name ADD column_name data_type [constraint-clause]。其中:table_name 为表名,column_name 为字段名,data_type 为数据类型,constraint-clause 为可选的约束。示例:ALTER TABLE employees ADD email VARCHAR2(100) 为 employees 表添加 email 字段。

数据导入方法:1. 使用 SQLLoader 实用程序:准备数据文件、创建控制文件、运行 SQLLoader;2. 使用 IMP/EXP 工具:导出数据、导入数据。提示:1. 大数据集推荐 SQL*Loader;2. 目标表应存在,列定义匹配;3. 导入后需验证数据完整性。

主键是唯一标识表中每一行的特殊列或列组合,它确保表中的记录都是独一无二的,可以通过以下步骤创建:使用 ALTER TABLE 语句指定表名。添加 PRIMARY KEY 关键字后跟要指定为主键的列名。主键约束有助于确保数据唯一性、提高查询速度、防止重复记录并简化表连接。
