The role of hdfs in hadoop is to provide storage for massive data and provide high-throughput data access. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and It provides high throughput access to application data and is suitable for applications with extremely large data sets.
Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distribution. Make full use of the power of clusters for high-speed computing and storage.
Hadoop implements a distributed file system (Hadoop Distributed File System), one of which is HDFS.
HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware; and it provides high throughput to access application data, making it suitable for those with Applications with large data sets. HDFS relaxes POSIX requirements and allows streaming access to data in the file system.
The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provides storage for massive data, while MapReduce provides calculation for massive data.
HDFS
To external clients, HDFS looks like a traditional hierarchical file system. Files can be created, deleted, moved or renamed, and more. But the architecture of HDFS is built on a specific set of nodes (see Figure 1), which is determined by its own characteristics. These nodes include the NameNode (only one), which provides metadata services within HDFS, and the DataNode, which provides storage blocks to HDFS. This is a drawback (single point of failure) of HDFS 1.x versions since only one NameNode exists. In Hadoop 2.x version, two NameNodes can exist, which solves the problem of single node failure.
Files stored in HDFS are divided into blocks, and these blocks are then copied to multiple computers (DataNodes). This is very different from traditional RAID architecture. The size of the blocks (defaults to 64MB for 1.x and 128MB for 2.x) and the number of blocks copied are determined by the client when the file is created. The NameNode controls all file operations. All communication within HDFS is based on the standard TCP/IP protocol.
For more related knowledge, please visit: PHP Chinese website!
The above is the detailed content of What is the role of hdfs in hadoop?. For more information, please follow other related articles on the PHP Chinese website!