Table of Contents
hdfs设计原则
hdfs不适用的场景
hdfs block
namenodes和datanodes
hdfs Federation
hdfs的高可用
failover和fencing
Home Database Mysql Tutorial hadoop深入研究:(一)hdfs介绍

hadoop深入研究:(一)hdfs介绍

Jun 07, 2016 pm 04:30 PM
h hadoop hdfs introduce In-depth research

转载请注明出处: http://blog.csdn.net/lastsweetop/article/details/8992505 hdfs设计原则 1.非常大的文件: 这里的非常大是指几百MB,GB,TB.雅虎的hadoop集群已经可以存储PB级别的数据 2.流式数据访问: 基于一次写,多次读。 3.商用硬件: ? ? ? ?hdfs的

转载请注明出处: http://blog.csdn.net/lastsweetop/article/details/8992505

hdfs设计原则

1.非常大的文件:

这里的非常大是指几百MB,GB,TB.雅虎的hadoop集群已经可以存储PB级别的数据

2.流式数据访问:

基于一次写,多次读。

3.商用硬件: ? ? ?

?hdfs的高可用是用软件来解决,因此不需要昂贵的硬件来保障高可用性,各个生产商售卖的pc或者虚拟机即可。

hdfs不适用的场景

1.低延迟的数据访问 ??

hdfs的强项在于大量的数据传输,递延迟不适合他,10毫秒以下的访问可以无视hdfs,不过hbase可以弥补这个缺陷。


2.太多小文件 ? ? ? ? ? ? ?

?namenode节点在内存中hold住了整个文件系统的元数据,因此文件的数量就会受到限制,每个文件的元数据大约150字节

?1百万个文件,每个文件只占一个block,那么就需要300MB内存。你的服务器可以hold住多少呢,你可以自己算算


3.多处写和随机修改 ??

目前还不支持多处写入以及通过偏移量随机修改

hdfs block

为了最小化查找时间比例,hdfs的块要比磁盘的块大很多。hdfs块的大小默认为64MB,和文件系统的块不同,

hdfs的文件可以小于块大小,并且不会占满整个块大小。

查找时间在10ms左右,数据传输几率在100MB/s,为了使查找时间是传输时间的1%,块的大小必须在100MB左右

一般都会设置为128MB


有了块的抽象之后,hdfs有了三个优点:


1.可以存储比单个磁盘更大的文件

2.存储块比存储文件更加简单,每个块的大小都基本相同

3.使用块比文件更适合做容错性和高可用

namenodes和datanodes

hdfs集群有两种类型的节点,一种为master及namenode,另一种为worker及datanodes。


namenode节点管理文件系统的命名空间。它包含一个文件系统的树,所有文件和目录的原数据都在这个树上,这些

信息被存储在本地磁盘的两个文件中,image文件和edit?log文件。文件相关的块存在哪个块中,块在哪个地方,这些

信息都是在系统启动的时候加载到namenode的内存中,并不会存储在磁盘中。


datanode节点在文件系统中充当的角色就是苦力,按照namenode和client的指令进行存储或者检索block,并且周期性

的向namenode节点报告它存了哪些文件的block


namenode节点如果不能使用了,那么整个hdfs就玩完了。为了防止这种情况,有两种方式可供选择

1.namenode通过配置元数据可以写到多个磁盘中,最好是独立的磁盘,或者NFS.

2.使用第二namenode节点,第二namenode节点平时并不作为namenode节点工作,它的主要工作内容就是定期根据编辑

日志(edit log)合并命名空间的镜像(namespace image),防止编辑日志过大,合并后的image它自己也保留一份,等着

namenode节点挂掉,然后它可以转正,由于不是实时的,有数据上的损失是很可能发生的。


hdfs Federation

namenode节点保持所有的文件和块的引用在内存中,这就意味着在一个拥有很多很多文件的很大的集群中,内存就成为了一个

限制的条件,hdfs federation在hadoop 2.x的被实现了,允许hdfs有多个namenode节点,每个管hdfs的一部分,比如一个管/usr,

另一个管/home,每个namenode节点是相互隔离的,一个挂掉不会影响另外一个。


hdfs的高可用

不管namenode节点的备份还是第二namenode节点都只能保证数据的恢复,并不能保证hdfs的高可用性,一旦namenode节点挂掉

就会产生单点故障,这时候要手动去数据备份恢复,或者启用第二节点,新的namenode节点在对外服务器要做三件事:

1.把命名空间的镜像加载到内存中

2.重新运行编辑日志

3.接受各个datanode节点的block报告

在一个大型一点的hdfs系统中,等这些做完需要30分钟左右。


2.x已经支持了高可用性(HA),通过一对namenode热备来实现,一台挂掉,备机马上提供无中断服务

要实现HA,要做三点微调:

1.namenode节点必须使用高可用的共享存储。

2.datanode节点必须象两个namenode节点发送block报告

3.客户端做改动可以在故障时切换到可用的namenode节点上,而且要对用户是无感知的


failover和fencing

将备份namenode激活的过程就叫failover,管理激活备份namenode的系统叫做failover controller,

zookeeper就可以担当这样的角色,可以保证只有一个节点处于激活状态。

必须确认原来的namenode已经真的挂掉了,很多时候只是网络延迟,如果备份节点已经激活了,

原来的节点又可以提供服务了,这样是不行的,防止原来namenode活过来的过程就叫fencing。

可以用STONITH实现, STONITH可以做到直接断电把原namenode节点fencing掉




作者:lastsweetop 发表于2013-5-31 15:31:20 原文链接

阅读:104 评论:0 查看评论

hadoop深入研究:(一)hdfs介绍

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Detailed introduction to what wapi is Detailed introduction to what wapi is Jan 07, 2024 pm 09:14 PM

Users may have seen the term wapi when using the Internet, but for some people they definitely don’t know what wapi is. The following is a detailed introduction to help those who don’t know to understand. What is wapi: Answer: wapi is the infrastructure for wireless LAN authentication and confidentiality. This is like functions such as infrared and Bluetooth, which are generally covered near places such as office buildings. Basically they are owned by a small department, so the scope of this function is only a few kilometers. Related introduction to wapi: 1. Wapi is a transmission protocol in wireless LAN. 2. This technology can avoid the problems of narrow-band communication and enable better communication. 3. Only one code is needed to transmit the signal

Detailed explanation of whether win11 can run PUBG game Detailed explanation of whether win11 can run PUBG game Jan 06, 2024 pm 07:17 PM

Pubg, also known as PlayerUnknown's Battlegrounds, is a very classic shooting battle royale game that has attracted a lot of players since its popularity in 2016. After the recent launch of win11 system, many players want to play it on win11. Let's follow the editor to see if win11 can play pubg. Can win11 play pubg? Answer: Win11 can play pubg. 1. At the beginning of win11, because win11 needed to enable tpm, many players were banned from pubg. 2. However, based on player feedback, Blue Hole has solved this problem, and now you can play pubg normally in win11. 3. If you meet a pub

Introduction to Python functions: Introduction and examples of exec function Introduction to Python functions: Introduction and examples of exec function Nov 03, 2023 pm 02:09 PM

Introduction to Python functions: Introduction and examples of exec function Introduction: In Python, exec is a built-in function that is used to execute Python code stored in a string or file. The exec function provides a way to dynamically execute code, allowing the program to generate, modify, and execute code as needed during runtime. This article will introduce how to use the exec function and give some practical code examples. How to use the exec function: The basic syntax of the exec function is as follows: exec

Detailed introduction to whether i5 processor can install win11 Detailed introduction to whether i5 processor can install win11 Dec 27, 2023 pm 05:03 PM

i5 is a series of processors owned by Intel. It has various versions of the 11th generation i5, and each generation has different performance. Therefore, whether the i5 processor can install win11 depends on which generation of the processor it is. Let’s follow the editor to learn about it separately. Can i5 processor be installed with win11: Answer: i5 processor can be installed with win11. 1. The eighth-generation and subsequent i51, eighth-generation and subsequent i5 processors can meet Microsoft’s minimum configuration requirements. 2. Therefore, we only need to enter the Microsoft website and download a "Win11 Installation Assistant" 3. After the download is completed, run the installation assistant and follow the prompts to install Win11. 2. i51 before the eighth generation and after the eighth generation

Introducing the latest Win 11 sound tuning method Introducing the latest Win 11 sound tuning method Jan 08, 2024 pm 06:41 PM

After updating to the latest win11, many users find that the sound of their system has changed slightly, but they don’t know how to adjust it. So today, this site brings you an introduction to the latest win11 sound adjustment method for your computer. It is not difficult to operate. And the choices are diverse, come and download and try them out. How to adjust the sound of the latest computer system Windows 11 1. First, right-click the sound icon in the lower right corner of the desktop and select "Playback Settings". 2. Then enter settings and click "Speaker" in the playback bar. 3. Then click "Properties" on the lower right. 4. Click the "Enhance" option bar in the properties. 5. At this time, if the √ in front of "Disable all sound effects" is checked, cancel it. 6. After that, you can select the sound effects below to set and click

How uniapp achieves rapid conversion between mini programs and H5 How uniapp achieves rapid conversion between mini programs and H5 Oct 20, 2023 pm 02:12 PM

How uniapp can achieve rapid conversion between mini programs and H5 requires specific code examples. In recent years, with the development of the mobile Internet and the popularity of smartphones, mini programs and H5 have become indispensable application forms. As a cross-platform development framework, uniapp can quickly realize the conversion between small programs and H5 based on a set of codes, greatly improving development efficiency. This article will introduce how uniapp can achieve rapid conversion between mini programs and H5, and give specific code examples. 1. Introduction to uniapp unia

PyCharm Beginner's Guide: Comprehensive Analysis of Replacement Functions PyCharm Beginner's Guide: Comprehensive Analysis of Replacement Functions Feb 25, 2024 am 11:15 AM

PyCharm is a powerful Python integrated development environment with rich functions and tools that can greatly improve development efficiency. Among them, the replacement function is one of the functions frequently used in the development process, which can help developers quickly modify the code and improve the code quality. This article will introduce PyCharm's replacement function in detail, combined with specific code examples, to help novices better master and use this function. Introduction to the replacement function PyCharm's replacement function can help developers quickly replace specified text in the code

Detailed information on the location of the printer driver on your computer Detailed information on the location of the printer driver on your computer Jan 08, 2024 pm 03:29 PM

Many users have printer drivers installed on their computers but don't know how to find them. Therefore, today I bring you a detailed introduction to the location of the printer driver in the computer. For those who don’t know yet, let’s take a look at where to find the printer driver. When rewriting content without changing the original meaning, you need to The language is rewritten to Chinese, and the original sentence does not need to appear. First, it is recommended to use third-party software to search. 2. Find "Toolbox" in the upper right corner. 3. Find and click "Device Manager" below. Rewritten sentence: 3. Find and click "Device Manager" at the bottom 4. Then open "Print Queue" and find your printer device. This time it is your printer name and model. 5. Right-click the printer device and you can update or uninstall it.

See all articles