Resource Manager HA配置详解
YARN中的资源管理器(Resource Manager)负责整个系统的资源管理和调度,并内部维护了各个应用程序的ApplictionMaster信息,NodeManager信息,资源使用信息等。在2.4版本之后,Hadoop Common同样提供了HA的功能,解决了这样一个基础服务的可靠性和容错性问题
YARN中的资源管理器(Resource Manager)负责整个系统的资源管理和调度,并内部维护了各个应用程序的ApplictionMaster信息,NodeManager信息,资源使用信息等。在2.4版本之后,Hadoop Common同样提供了HA的功能,解决了这样一个基础服务的可靠性和容错性问题。其架构如下:
Overview of ResourceManager High Availability
RM HA与NN HA有诸多相同之处(NameNode HA配置详解 ):
(1). Active/Standby架构,同一时间只有一个RM处于活动状态(如上图所示)。
(2). 依赖zooKeeper实现。手动切换使用yarn rmadmin命令(类似hdfs haadmin命令),而自动故障转移使用ZKFailoverController。但不同的是,zkfc只作为RM中一个线程而非独立的守护进程来启动。
(3). 当存在多个RM时,客户端使用的yarn-site.xml需要指定RM的列表。 客户端, ApplicationMasters (AMs)和NodeManagers (NMs) 会以轮训的方式寻找活动状态的RM,也就是说AM
s和NMs需要自己提供容错机制。如果当前活动状态的RM挂掉了,那么会继续使用轮训的方式找到新的RM。这种逻辑的实现需要在yarn.client.failover-proxy-provider中指定使用的类:org.apache.hadoop.yarn.client.RMFailoverProxyProvider
此外,新的RM可以恢复之前RM的状态(详见ResourceManger Restart )。当启动RM Restart,重启后的RM就加载之前活动RM的状态信息并继续之前RM的操作,这样应用程序定期执行检查点操作,就可以避免工作内容的丢失。在Active/standby的RM中,活动RM的状态数据需要active和standby都能访问,使用共享文件系统方法(FileSystemRMStateStore )或者zooKeeper方法(ZKRMStateStore)。后者在同一时间只允许一个RM有写入权限。
一个常见的YARN RM HA配置如下:
yarn.resourcemanager.ha.enabled true yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 debugo01 yarn.resourcemanager.hostname.rm2 debugo02 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore yarn.resourcemanager.zk-address debugo01:2181,debugo02:2181,debugo03:2181 For multiple zk services, separate them with comma yarn.resourcemanager.cluster-id yarn-ha yarn.resourcemanager.ha.automatic-failover.enabled true Enable automatic failover; By default, it is enabled only when HA is enabled. yarn.resourcemanager.ha.automatic-failover.zk-base-path /yarn-leader-election Optional setting. The default value is /yarn-leader-election yarn.client.failover-proxy-provider org.apache.hadoop.yarn.client.RMFailoverProxyProvider
同时,yarn RM服务监听地址的设置要修改成下面的方式:
启动RM
start-yarn.sh
在standby的节点单独启动RM(也可使用start-yarn.sh脚本)
检查状态:
$ yarn rmadmin -getServiceState rm1 active $ yarn rmadmin -getServiceState rm2 standby
访问rm2节点的nodemanager会提示
This is standby RM. Redirecting to the current active RM: http://debugo01:8188/cluster/apps
下面KILL掉rm1的resourcemanager
$ yarn rmadmin -getServiceState rm2 active ?$ yarn rmadmin -getServiceState rm1 14/09/14 03:08:23 INFO ipc.Client: Retrying connect to server: debugo01/192.168.46.201:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) Operation failed: Call From debugo01/192.168.46.201 to debugo01:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
参考
http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-ha-in-cdh5/
原文地址:Resource Manager HA配置详解, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Title: The working principle and configuration method of GDM in Linux systems In Linux operating systems, GDM (GNOMEDisplayManager) is a common display manager used to control graphical user interface (GUI) login and user session management. This article will introduce the working principle and configuration method of GDM, as well as provide specific code examples. 1. Working principle of GDM GDM is the display manager in the GNOME desktop environment. It is responsible for starting the X server and providing the login interface. The user enters

Windows operating system is one of the most popular operating systems in the world, and its new version Win11 has attracted much attention. In the Win11 system, obtaining administrator rights is an important operation. Administrator rights allow users to perform more operations and settings on the system. This article will introduce in detail how to obtain administrator permissions in Win11 system and how to effectively manage permissions. In the Win11 system, administrator rights are divided into two types: local administrator and domain administrator. A local administrator has full administrative rights to the local computer

Understanding Linux Bashrc: Function, Configuration and Usage In Linux systems, Bashrc (BourneAgainShellruncommands) is a very important configuration file, which contains various commands and settings that are automatically run when the system starts. The Bashrc file is usually located in the user's home directory and is a hidden file. Its function is to customize the Bashshell environment for the user. 1. Bashrc function setting environment

Detailed explanation of division operation in OracleSQL In OracleSQL, division operation is a common and important mathematical operation, used to calculate the result of dividing two numbers. Division is often used in database queries, so understanding the division operation and its usage in OracleSQL is one of the essential skills for database developers. This article will discuss the relevant knowledge of division operations in OracleSQL in detail and provide specific code examples for readers' reference. 1. Division operation in OracleSQL

Title: How to configure and install FTPS in Linux system, specific code examples are required. In Linux system, FTPS is a secure file transfer protocol. Compared with FTP, FTPS encrypts the transmitted data through TLS/SSL protocol, which improves Security of data transmission. In this article, we will introduce how to configure and install FTPS in a Linux system and provide specific code examples. Step 1: Install vsftpd Open the terminal and enter the following command to install vsftpd: sudo

The modulo operator (%) in PHP is used to obtain the remainder of the division of two numbers. In this article, we will discuss the role and usage of the modulo operator in detail, and provide specific code examples to help readers better understand. 1. The role of the modulo operator In mathematics, when we divide an integer by another integer, we get a quotient and a remainder. For example, when we divide 10 by 3, the quotient is 3 and the remainder is 1. The modulo operator is used to obtain this remainder. 2. Usage of the modulo operator In PHP, use the % symbol to represent the modulus

MyBatisGenerator is a code generation tool officially provided by MyBatis, which can help developers quickly generate JavaBeans, Mapper interfaces and XML mapping files that conform to the database table structure. In the process of using MyBatisGenerator for code generation, the setting of configuration parameters is crucial. This article will start from the perspective of configuration parameters and deeply explore the functions of MyBatisGenerator.

Teach you step by step how to configure Maven local warehouse: improve project construction speed Maven is a powerful project management tool that is widely used in Java development. It can help us manage project dependencies, build projects, and publish projects, etc. However, during the actual development process, we sometimes encounter the problem of slow project construction. One solution is to configure a local repository to improve project build speed. This article will teach you step by step how to configure the Maven local warehouse to make your project construction more efficient. Why do you need to configure a local warehouse?
