Table of Contents
1、资源隔离
3.1、现状
3.2、解决方案
2、禁止跨队列kill job
3、存储隔离
Home Database Mysql Tutorial HADOOP资源/存储隔离

HADOOP资源/存储隔离

Jun 07, 2016 pm 04:39 PM
hadoop storage status quo resource isolation

1、资源隔离 1.1、现状 a、每一个队列设置'Min Resources'、'Max Resources',当该队列处于空闲状态,其他队列可从该队列争夺资源,突破该队列的最小资源数。而忙碌的队列可突破最大资源数。此时空闲队列,同时提交很多job,资源不够,抢占队列在一定的时间内

1、资源隔离

1.1、现状

a、每一个队列设置'Min Resources'、'Max Resources',当该队列处于空闲状态,其他队列可从该队列争夺资源,突破该队列的最小资源数。而忙碌的队列可突破最大资源数。此时空闲队列,同时提交很多job,资源不够,抢占队列在一定的时间内没有释放资源,会强制kill job,释放资源,还给空闲队列。

b、设置 mapreduce.job.queuename='资源多队列',可跨队列提交。

1.2、解决方案

1.2.1、禁止跨队列提交任务,即屏蔽'mapreduce.job.queuename'参数。

1.2.2、修改配置文件步骤

a、修改fair_scheduler.xml 文件在增加以下参数

dd001                --- dd001为user
dd001
Copy after login

描述: aclSubmitApps:可向队列中提交应用程序的Linux用户或用户组列表,默认情况下为“*”,表示任何用户均可以向该队列提交应用程序。

需要注意的是,该属性具有继承性,即子队列的列表会继承父队列的列表。配置该属性时,用户之间或用户组之间用“,”分割,用户和用户组之间用空格分割,比如“user1, user2 group1,group2”。

aclAdministerApps:该队列的管理员列表。一个队列的管理员可管理该队列中的资源和应用程序,比如可杀死任意应用程序。

2、禁止跨队列kill job

2.1、现状

a、yarn.admin.acl的value值为'*',表示所有的用户都可以kill其他用户的job。

2.2、解决方案

2.2.1、禁止跨队列kill job,保证除了超级用户,其他用户只能kill自己对应的队列job。
2.2.2、修改配置步骤
Copy after login

a、mapred_site.xml 文件增加以下参数

mapreduce.cluster.acls.enabled
true
Copy after login

b、yarn-site.xml 文件增加以下参数

yarn.acl.enable
true
yarn.admin.acl
hadp
Copy after login

c、core-site.xml文件注入如下参数 -----防止前端的appcluser UI 出现访问权限问题

hadoop.http.staticuser.user
hadp
Copy after login

3、存储隔离

3.1、现状

a、不同用户只能对自己用户下的目录有写的权限,但目录大小没有上限。可能导致有些用户无止境的写,而另一些用户,没有空间写。

3.2、解决方案

3.2.1、根据业务大小,对用户对应的目录大小进行配置。

a、未设置配额的文件属性

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfs -count -q hdfs://ns1/user/dd001/warehouse/test_lh
none inf none inf 1 0 0 hdfs://ns1/user/dd_edw/warehouse/test_lh
Copy after login

文件数限额 可用文件数 空间限额 可用空间 目录数 文件数 总大小 文件/目录名

b、设置配额命令

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfsadmin -setSpaceQuota 400 hdfs://ns1/user/dd001/warehouse/test_lh
Copy after login

c、设置配额后的属性值

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfs -count -q hdfs://ns1/user/dd001/warehouse/test_lh
none inf 400 400 1 0 0 hdfs://ns1/user/dd_edw/warehouse/test_lh
Copy after login

d、测试目录超过配额后,出现什么结果

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfs -cp hdfs://ns1/user/dd001/warehouse/000026_0.lzo hdfs://ns1/user/dd001/warehouse/test_lh
14/10/04 17:54:14 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /user/dd_edw/warehouse/test_lh is exceeded: quota = 400 B = 400 B but diskspace consumed = 402653184 B = 384 MB
at org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:191)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2054)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1789)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1764)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:357)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:2847)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2508)
at org.apache.hadoop.hd
cp文件时候报错,文件比配额来的大。
Copy after login

e、配额删除命令

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfsadmin -clrSpaceQuota hdfs://ns1/user/dd001/warehouse/test_lh
Copy after login

3.3、监控

增加配额只是一条命令的事情,限制存储不是目的,是手段。最终目的还是为了资源更充分的得到利用,防止超过配额,而不是任务报错。因此做好监控是首要任务。

3.3.1、资源分配

队列名 用户机器数 机器总配额(T) 集群机器分配总数 平均配额=(总配额/集群机器分配总数)(T) 硬盘预留值(T) 实际配额=(平均配额-硬盘预留值 )*机器数
dd001 20 21 20 20.9715 0.0488 418.454
a、平均配额=总配额/集群机器分配总数。

实际配额=(平均配额-硬盘预留值 )*机器数。

b、报警值=实际配额 * 0.8。

3.3.2、磁盘报警后处理

a、删除冗余数据。

b、加机器。

在加机器的时候,内存、cpu也需要相对的调整。

401 mb,19vcores
401 mb,19vcores
Copy after login

两个参数做相应的调整,而配额的调整命令如下:

a、hdfs dfsadmin –clrSpaceQuota hdfs://ns1/user/dd001/warehouse/test_lh ---删除配额

b、hdfs dfsadmin -setSpaceQuota ‘实际配额’ hdfs://ns1/user/dd001/warehouse/test_lh ---增加新的配额。

c、增加多少配额,即增加多少机器

c.1、目录存储量使用平均日增长=sum(日增长)/count(1)

c.2、机器数 =(磁盘可用存储天数 * 目录存储量使用平均日增长)/(平均配额-硬盘预留值)

c.3、实例:

假设'目录存储量使用平均日增长'=0.5T

机器数=(90*0.5)/ (18.4279-0.0488)= 3台

参考文档:

http://blog.csdn.net/caizhongda/article/details/7468363

http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-configurations-fair-scheduler/

http://www.yufan-liu.com/blog/?p=291

http://blog.itpub.net/122978/viewspace-1119883/

http://www.07net01.com/zhishi/520762.html

http://f.dataguru.cn/thread-103012-1-1.html

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Huawei will launch innovative MED storage products next year: rack capacity exceeds 10 PB and power consumption is less than 2 kW Huawei will launch innovative MED storage products next year: rack capacity exceeds 10 PB and power consumption is less than 2 kW Mar 07, 2024 pm 10:43 PM

This website reported on March 7 that Dr. Zhou Yuefeng, President of Huawei's Data Storage Product Line, recently attended the MWC2024 conference and specifically demonstrated the new generation OceanStorArctic magnetoelectric storage solution designed for warm data (WarmData) and cold data (ColdData). Zhou Yuefeng, President of Huawei's data storage product line, released a series of innovative solutions. Image source: Huawei's official press release attached to this site is as follows: The cost of this solution is 20% lower than that of magnetic tape, and its power consumption is 90% lower than that of hard disks. According to foreign technology media blocksandfiles, a Huawei spokesperson also revealed information about the magnetoelectric storage solution: Huawei's magnetoelectronic disk (MED) is a major innovation in magnetic storage media. First generation ME

Vue3+TS+Vite development skills: how to encrypt and store data Vue3+TS+Vite development skills: how to encrypt and store data Sep 10, 2023 pm 04:51 PM

Vue3+TS+Vite development tips: How to encrypt and store data. With the rapid development of Internet technology, data security and privacy protection are becoming more and more important. In the Vue3+TS+Vite development environment, how to encrypt and store data is a problem that every developer needs to face. This article will introduce some common data encryption and storage techniques to help developers improve application security and user experience. 1. Data Encryption Front-end Data Encryption Front-end encryption is an important part of protecting data security. Commonly used

How to clear cache on Windows 11: Detailed tutorial with pictures How to clear cache on Windows 11: Detailed tutorial with pictures Apr 24, 2023 pm 09:37 PM

What is cache? A cache (pronounced ka·shay) is a specialized, high-speed hardware or software component used to store frequently requested data and instructions, which in turn can be used to load websites, applications, services, and other aspects of the system faster part. Caching makes the most frequently accessed data readily available. Cache files are not the same as cache memory. Cache files refer to frequently needed files such as PNGs, icons, logos, shaders, etc., which may be required by multiple programs. These files are stored in your physical drive space and are usually hidden. Cache memory, on the other hand, is a type of memory that is faster than main memory and/or RAM. It greatly reduces data access time since it is closer to the CPU and faster compared to RAM

Java Errors: Hadoop Errors, How to Handle and Avoid Java Errors: Hadoop Errors, How to Handle and Avoid Jun 24, 2023 pm 01:06 PM

Java Errors: Hadoop Errors, How to Handle and Avoid When using Hadoop to process big data, you often encounter some Java exception errors, which may affect the execution of tasks and cause data processing to fail. This article will introduce some common Hadoop errors and provide ways to deal with and avoid them. Java.lang.OutOfMemoryErrorOutOfMemoryError is an error caused by insufficient memory of the Java virtual machine. When Hadoop is

How to find resources on 115 network disk How to find resources on 115 network disk Feb 23, 2024 pm 05:10 PM

There will be a lot of resources in the 115 network disk, so how to find resources? Users can search for the resources they need in the software, then enter the download interface, and then choose to save to the network disk. This introduction to the method of finding resources on 115 network disk can tell you the specific content. The following is a detailed introduction, come and take a look. How to find resources on 115 network disk? Answer: Search the content in the software, and then click to save to the network disk. Detailed introduction: 1. First enter the resources you want in the app. 2. Then click the keyword link that appears. 3. Then enter the download interface. 4. Click Save to network disk inside.

Why did Han Xiaoquan suddenly have no resources? Why did Han Xiaoquan suddenly have no resources? Feb 24, 2024 pm 03:22 PM

Han Xiaoquan is a software that can watch many Korean dramas, so why is there suddenly no resource? This software may have no resources due to network problems, version problems, or copyright issues. This article about the reason why Han Xiaoquan suddenly has no resources can tell you the specific content. The following is a detailed introduction, come and take a look. Why did Han Xiaoquan suddenly have no resources? Answer: Due to network problems, version problems, and copyright issues, detailed introduction: 1. Solution to network problems: You can choose a different network, and then log in to the software again to try. 2. Solution to version problems: Users can download the latest version of this software from the official website. 3. Solutions to copyright issues: Some Korean dramas are removed from the shelves due to copyright issues. You can choose other Korean dramas to watch.

Git installation process on Ubuntu Git installation process on Ubuntu Mar 20, 2024 pm 04:51 PM

Git is a fast, reliable, and adaptable distributed version control system. It is designed to support distributed, non-linear workflows, making it ideal for software development teams of all sizes. Each Git working directory is an independent repository with a complete history of all changes and the ability to track versions even without network access or a central server. GitHub is a Git repository hosted on the cloud that provides all the features of distributed revision control. GitHub is a Git repository hosted on the cloud. Unlike Git which is a CLI tool, GitHub has a web-based graphical user interface. It is used for version control, which involves collaborating with other developers and tracking changes to scripts and

Detailed explanation of the location of win10 virus isolation Detailed explanation of the location of win10 virus isolation Dec 25, 2023 pm 01:45 PM

Files isolated by Win10 viruses are generally stored in fixed locations. Many users want to open the quarantined files and restore their own files, but they don’t know where they are stored. In fact, they can usually be found in the isolation folder of the C drive security software. Where is the win10 virus quarantine file? Answer: In the C:\ProgramData\Microsoft\WindowsDefender\Quarantine folder, we can open this computer and directly copy and paste the file path to the path bar above to find it. Introduction to the virus isolation folder in win10: 1. In win10, the "Quarantine" file usually requires certain permissions to open. It is recommended to open the file as an administrator. 2,

See all articles