What is the role of hdfs in hadoop?-Apache-php.cn

Home

Operation and Maintenance

Apache

What is the role of hdfs in hadoop?

青灯夜游

Sep 03, 2020 am 11:48 AM

hadoop hdfs

The role of hdfs in hadoop is to provide storage for massive data and provide high-throughput data access. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and It provides high throughput access to application data and is suitable for applications with extremely large data sets.

What is the role of hdfs in hadoop?

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distribution. Make full use of the power of clusters for high-speed computing and storage.

Hadoop implements a distributed file system (Hadoop Distributed File System), one of which is HDFS.

HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware; and it provides high throughput to access application data, making it suitable for those with Applications with large data sets. HDFS relaxes POSIX requirements and allows streaming access to data in the file system.

The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provides storage for massive data, while MapReduce provides calculation for massive data.

HDFS

To external clients, HDFS looks like a traditional hierarchical file system. Files can be created, deleted, moved or renamed, and more. But the architecture of HDFS is built on a specific set of nodes (see Figure 1), which is determined by its own characteristics. These nodes include the NameNode (only one), which provides metadata services within HDFS, and the DataNode, which provides storage blocks to HDFS. This is a drawback (single point of failure) of HDFS 1.x versions since only one NameNode exists. In Hadoop 2.x version, two NameNodes can exist, which solves the problem of single node failure.

Files stored in HDFS are divided into blocks, and these blocks are then copied to multiple computers (DataNodes). This is very different from traditional RAID architecture. The size of the blocks (defaults to 64MB for 1.x and 128MB for 2.x) and the number of blocks copied are determined by the client when the file is created. The NameNode controls all file operations. All communication within HDFS is based on the standard TCP/IP protocol.

For more related knowledge, please visit: PHP Chinese website!

The above is the detailed content of What is the role of hdfs in hadoop?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7478

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

Java Errors: Hadoop Errors, How to Handle and Avoid Jun 24, 2023 pm 01:06 PM

Java Errors: Hadoop Errors, How to Handle and Avoid When using Hadoop to process big data, you often encounter some Java exception errors, which may affect the execution of tasks and cause data processing to fail. This article will introduce some common Hadoop errors and provide ways to deal with and avoid them. Java.lang.OutOfMemoryErrorOutOfMemoryError is an error caused by insufficient memory of the Java virtual machine. When Hadoop is

what is hdfs command Mar 14, 2023 pm 03:51 PM

The hdfs command refers to the command of the Hadoop hdfs system. Its common commands include: 1. ls command; 2. cat command; 3. mkdir command; 4. rm command; 5. put command; 6. cp command; 7. copyFromLocal command; 8. get command; 9. copyToLocal command; 10. mv command, etc.

Using Hadoop and HBase in Beego for big data storage and querying Jun 22, 2023 am 10:21 AM

With the advent of the big data era, data processing and storage have become more and more important, and how to efficiently manage and analyze large amounts of data has become a challenge for enterprises. Hadoop and HBase, two projects of the Apache Foundation, provide a solution for big data storage and analysis. This article will introduce how to use Hadoop and HBase in Beego for big data storage and query. 1. Introduction to Hadoop and HBase Hadoop is an open source distributed storage and computing system that can

How to use PHP and Hadoop for big data processing Jun 19, 2023 pm 02:24 PM

As the amount of data continues to increase, traditional data processing methods can no longer handle the challenges brought by the big data era. Hadoop is an open source distributed computing framework that solves the performance bottleneck problem caused by single-node servers in big data processing through distributed storage and processing of large amounts of data. PHP is a scripting language that is widely used in web development and has the advantages of rapid development and easy maintenance. This article will introduce how to use PHP and Hadoop for big data processing. What is HadoopHadoop is

Explore the application of Java in the field of big data: understanding of Hadoop, Spark, Kafka and other technology stacks Dec 26, 2023 pm 02:57 PM

Java big data technology stack: Understand the application of Java in the field of big data, such as Hadoop, Spark, Kafka, etc. As the amount of data continues to increase, big data technology has become a hot topic in today's Internet era. In the field of big data, we often hear the names of Hadoop, Spark, Kafka and other technologies. These technologies play a vital role, and Java, as a widely used programming language, also plays a huge role in the field of big data. This article will focus on the application of Java in large

Use PHP to achieve large-scale data processing: Hadoop, Spark, Flink, etc. May 11, 2023 pm 04:13 PM

As the amount of data continues to increase, large-scale data processing has become a problem that enterprises must face and solve. Traditional relational databases can no longer meet this demand. For the storage and analysis of large-scale data, distributed computing platforms such as Hadoop, Spark, and Flink have become the best choices. In the selection process of data processing tools, PHP is becoming more and more popular among developers as a language that is easy to develop and maintain. In this article, we will explore how to leverage PHP for large-scale data processing and how

How to install Hadoop in linux May 18, 2023 pm 08:19 PM

1: Install JDK1. Execute the following command to download the JDK1.8 installation package. wget--no-check-certificatehttps://repo.huaweicloud.com/java/jdk/8u151-b12/jdk-8u151-linux-x64.tar.gz2. Execute the following command to decompress the downloaded JDK1.8 installation package. tar-zxvfjdk-8u151-linux-x64.tar.gz3. Move and rename the JDK package. mvjdk1.8.0_151//usr/java84. Configure Java environment variables. echo'

Data processing engines in PHP (Spark, Hadoop, etc.) Jun 23, 2023 am 09:43 AM

In the current Internet era, the processing of massive data is a problem that every enterprise and institution needs to face. As a widely used programming language, PHP also needs to keep up with the times in data processing. In order to process massive data more efficiently, PHP development has introduced some big data processing tools, such as Spark and Hadoop. Spark is an open source data processing engine that can be used for distributed processing of large data sets. The biggest feature of Spark is its fast data processing speed and efficient data storage.

See all articles