What is the role of hdfs in hadoop?

青灯夜游
Release: 2020-09-09 10:37:06
Original
10823 people have browsed it

The role of hdfs in hadoop is to provide storage for massive data and provide high-throughput data access. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and It provides high throughput access to application data and is suitable for applications with extremely large data sets.

What is the role of hdfs in hadoop?

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distribution. Make full use of the power of clusters for high-speed computing and storage.

Hadoop implements a distributed file system (Hadoop Distributed File System), one of which is HDFS.

HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware; and it provides high throughput to access application data, making it suitable for those with Applications with large data sets. HDFS relaxes POSIX requirements and allows streaming access to data in the file system.

The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provides storage for massive data, while MapReduce provides calculation for massive data.

HDFS

To external clients, HDFS looks like a traditional hierarchical file system. Files can be created, deleted, moved or renamed, and more. But the architecture of HDFS is built on a specific set of nodes (see Figure 1), which is determined by its own characteristics. These nodes include the NameNode (only one), which provides metadata services within HDFS, and the DataNode, which provides storage blocks to HDFS. This is a drawback (single point of failure) of HDFS 1.x versions since only one NameNode exists. In Hadoop 2.x version, two NameNodes can exist, which solves the problem of single node failure.

Files stored in HDFS are divided into blocks, and these blocks are then copied to multiple computers (DataNodes). This is very different from traditional RAID architecture. The size of the blocks (defaults to 64MB for 1.x and 128MB for 2.x) and the number of blocks copied are determined by the client when the file is created. The NameNode controls all file operations. All communication within HDFS is based on the standard TCP/IP protocol.

For more related knowledge, please visit: PHP Chinese website!

The above is the detailed content of What is the role of hdfs in hadoop?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template