What services do hive components provide?-Common Problem-php.cn

Table of Contents

Hive architecture principle

Home

Common Problem

What services do hive components provide?

青灯夜游

Nov 18, 2022 am 11:57 AM

hive

The services that the hive component can provide: 1. Convert SQL statements into mapreduce code; 2. Data can be stored, using HDFS; 3. Data can be calculated, using MapReduce. hive is a data warehouse tool based on Hadoop, used for data extraction, transformation, and loading; hive data warehouse tool can map structured data files into a database table, and provides SQL query functions, which can convert SQL statements into MapReduce tasks to execute.

What services do hive components provide?

The operating environment of this tutorial: Windows 7 system, Dell G3 computer.

When building a data warehouse, the Hive component plays a very key role. We know that Hive is an important data warehouse tool based on Hadoop, but how to apply it requires further exploration.

What is Hive

hive is a data warehouse tool based on Hadoop, used for data extraction, transformation, and loading. It is a mechanism that can store, query and analyze large-scale data stored in Hadoop. The hive data warehouse tool can map structured data files into a database table, and provides SQL query functions, which can convert SQL statements into MapReduce tasks for execution. The advantage of Hive is that it has low learning cost and can implement fast MapReduce statistics through SQL-like statements, making MapReduce simpler without having to develop a specialized MapReduce application. hive is very suitable for statistical analysis of data warehouses

What is Hive used for

1. Convert SQL statements into mapreduce code
2. Data can be stored using HDFS
3. Data can be calculated using MapReduce

What are the advantages of Hive

a.Hive’s advantages

(1) Simple and easy to use: Provides SQL-like query language HQL

(2) Scalable: Designed for extremely large data sets Computing/expansion capabilities (MR as the computing engine, HDFS as the storage system)

Generally, there is no need to restart the service. Hive can freely expand the scale of the cluster.

(3) Provide unified metadata management

(4) Scalability: Hive supports user-defined functions, and users can implement their own functions according to their own needs

(5) Fault tolerance: Good fault tolerance, if there is a problem with the node, SQL can still complete the execution

b. Disadvantages of Hive

(1)hive’s HQL expression ability Limited

1) Iterative algorithms cannot be expressed, such as pagerank
2) Data mining, such as kmeans

(2)hive’s efficiency is relatively low

1)The mapreduce jobs automatically generated by hive are usually not intelligent enough
2) Hive tuning is difficult and the granularity is coarse
3) Hive has poor controllability

(3)Hive does not support things . Mainly used for OLAP (Online Analytical Processing)

What services do hive components provide?

1) The data processed by Hive is stored in HDFS

2) The default implementation of the bottom layer of Hive analysis data is MapReduce

3) The executor runs on Yarn

Summary: It is equivalent to a client of hadoop effect.

Why use Hive

(1) Comparison between Hive and traditional database

What services do hive components provide?

Hive is used for offline data analysis of massive data. Hive has the appearance of a SQL database, but the application scenarios are completely different. Hive is only suitable for statistical analysis of batch data.

(2) Advantages of Hive

Hive uses HDFS to store data and MapReduce to query and analyze data. Because directly using Hadoop MapReduce to process data will face the problem of high personnel learning costs, and it is too difficult to develop complex query logic using MapReduce. With Hive, the operation interface adopts SQL-like syntax, which not only provides rapid development capabilities but also avoids writing MapReduce, thereby reducing developers' learning costs and making function expansion more convenient.

What problem does Hive solve?

Hive solves the query function of big data, so that people who cannot write MR can also use MR. Its essence is to convert HQL into MR. Its bottom layer is MR. Writing MR is inefficient and painful. The emergence of Hive has brought shortcuts and good news to JAVAEE brothers.

Hive architecture principle

What services do hive components provide?

#1. User interface: Client

CLI (hive shell), JDBC/ODBC (java access hive), WEBUI (browser access hive)

2. Metadata: Metastore

Metadata includes: table name, database to which the table belongs (default is default), table owner, column/partition field, type of table

(whether it is an external table), and table data location Directory, etc.;

Metadata: Metastore

Metadata includes: table name, database to which the table belongs (default is default), table owner, column/partition field, table

type (whether it is an external table), the directory where the table data is located, etc.;

is stored in the built-in derby database by default. It is recommended to use MySQL to store Metastore.

3. Hadoop

Uses HDFS for storage and MapReduce for calculation.

4. Driver: Driver

(1) Parser (SQL Parser): Convert SQL string into abstract syntax tree AST. This step is generally completed using a

third-party tool library, such as antlr; Perform syntax analysis on the AST, such as whether the table exists, whether the fields exist, and whether the SQL semantics are incorrect.

(2) Compiler (Physical Plan): Compile the AST to generate a logical execution plan.

(3) Optimizer (Query Optimizer): Optimize the logical execution plan.

(4) Execution: Convert the logical execution plan into a physical plan that can be run. For Hive, it is MR/Spark.

Hive is built on Hadoop, and all Hive data is stored in HDFS. The database

What services do hive components provide? can save data in a block device or local file system.

Since Hive is designed for data warehouse applications, the content of the data warehouse requires more reading and less writing. Therefore, it is not recommended to rewrite data in Hive

. All data is determined when loading. The data in the database usually needs to be modified frequently, so you can use INSERT INTO... VALUES to add data and UPDATE... SET to modify the data.

Comparison between Hive and database

Because Hive uses a SQL-like query language HQL (Hive Query Language), it is easy to understand Hive as a database. In fact, from a structural point of view, apart from having similar query languages, Hive and database have nothing in common. This section will explain the differences between Hive and databases from many aspects. Databases can be used in Online applications, but Hive is designed for data warehouses. Knowing this will help you understand the characteristics of Hive from an application perspective.

1. Query language Since SQL is widely used in data warehouses, the SQL-like query language HQL is designed specifically for the characteristics of Hive. Developers who are familiar with SQL development can easily use Hive for development.

2. Data storage location Hive is built on Hadoop, and all Hive data is stored in HDFS. The database can store data in a block device or local file system.

3. Data update: Since Hive is designed for data warehouse applications, the content of the data warehouse requires more reading and less writing. Therefore, it is not recommended to rewrite data in Hive. All data is determined when loading. The data in the database usually needs to be modified frequently, so you can use INSERT INTO... VALUES to add data and UPDATE... SET to modify the data.

4. Index: Hive does not perform any processing on the data during the process of loading the data, and does not even scan the data. Therefore, some keys in the data are not indexed. When Hive wants to access specific values in the data that meet conditions, it needs to brute force scan the entire data, so the access latency is high. Due to the introduction of MapReduce, Hive can access data in parallel, so even without indexes, Hive can still show advantages for accessing large amounts of data. In a database, an index is usually created on one or several columns, so the database can have high efficiency and low latency for accessing a small amount of data with specific conditions. Due to the high latency of data access, Hive is not suitable for online data query.

5. Execution: Most queries in Hive are executed through MapReduce provided by Hadoop. The database usually has its own execution engine.

6. Execution delay: When Hive queries data, since there is no index, the entire table needs to be scanned, so the delay is high. Another factor that contributes to high Hive execution latency is the MapReduce framework. Since MapReduce itself has high latency, there will also be high latency when using MapReduce to execute Hive queries. In contrast, the execution latency of the database is low. Of course, this low is conditional, that is, the data scale is small. When the data scale is large enough to exceed the processing capabilities of the database, Hive's parallel computing can obviously show its advantages.

7. Scalability: Since Hive is built on Hadoop, the scalability of Hive is consistent with the scalability of Hadoop (the world's largest Hadoop cluster is in Yahoo!, 2009 The annual scale is around 4,000 nodes). Due to the strict restrictions of ACID semantics, the database has very limited expansion rows. Currently, Oracle, the most advanced parallel database, has a theoretical scalability of only about 100 units.

8. Data scale: Since Hive is built on a cluster and can use MapReduce for parallel computing, it can support large-scale data; correspondingly, the database can support smaller data scale.

For more programming-related knowledge, please visit: Programming Teaching! !

The above is the detailed content of What services do hive components provide?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7615

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

Getting Started with PHP: PHP and Hive May 20, 2023 am 08:33 AM

PHP is a widely used server-side programming language that is used in almost all industries. In this article, we will explore the special role of PHP in big data processing. Under certain circumstances, PHP can collaborate with ApacheHive to achieve real-time data processing and analysis. First, let’s introduce Hive. Hive is a Hadoop-based data warehouse solution. It can map structured data into SQL queries and execute the queries as MapReduce tasks.

Use Hive in Go language to implement efficient data warehouse Jun 15, 2023 pm 08:52 PM

In recent years, data warehouses have become an integral part of enterprise data management. Directly using the database for data analysis can meet simple query needs, but when we need to perform large-scale data analysis, a single database can no longer meet the needs. At this time, we need to use a data warehouse to process massive data. Hive is one of the most popular open source components in the data warehouse field. It can integrate the Hadoop distributed computing engine and SQL queries and support parallel processing of massive data. At the same time, in Go language, use

PHP implements open source Hive big data analysis platform Jun 18, 2023 pm 02:47 PM

As data processing becomes more and more important, big data analysis becomes more and more common. However, many companies may not want to spend a lot of money on a business analytics platform. Open source solutions offer these companies a viable option. In this article, we will discuss how to implement the open source Hive big data analysis platform using PHP. Hive is a Hadoop-based data warehouse system that can query and manage large-scale data sets on Hadoop through SQL. It uses the SQL-like HiveQL language to query

Microsoft releases fix for Behavior:Win32/Hive.ZY error in Windows Defender Apr 28, 2023 pm 04:01 PM

A Microsoft official confirmed widespread reports that Google Chrome, ChromiumEdge, Discord and several other applications were flagged as "Behavior:Win32/Hive.ZY" by Microsoft's built-in antivirus software "WindowsDefender". The tech giant confirmed in a statement that it is working on a fix that will be rolled out to everyone in the next few hours. So what exactly is "Behavior:Win32/Hive.ZY"? According to a document posted on Microsoft's security portal, any file marked "Behavior:Win32/Hive.ZY" is

Centos7 installation and configuration Hive tutorial. Feb 19, 2024 pm 02:21 PM

When installing and configuring Hive on CentOS7, you can follow these steps: Make sure Java is installed: First, make sure Java is installed on CentOS7. You can check whether Java is installed using the following command: java-version If Java is not installed, please install the appropriate Java version according to your needs. Download Hive: Visit the official website of ApacheHive () and download the latest stable version of Hive. Decompress the Hive compressed package: Use the following command to decompress the Hive compressed package: tarxvfzhive-x.x.x.tar.gz This will decompress Hive to the current directory. Configure environment variables: open the terminal,

Microsoft Exchange Server attacked by Hive's 'windows.exe” ransomware Apr 16, 2023 pm 01:28 PM

While keeping software updated and only downloading files from trusted sources are standard cybersecurity practices, given the recent increase in malware attacks, it's clear that more education is needed in this area. To that end, the Varonis forensics team has provided some guidance on how attackers using Hive ransomware are targeting Microsoft Exchange Server in their latest series of attacks. For those who don’t know, Hive follows a ransomware-as-a-service model. Although Microsoft is targeting E in 2021 for known vulnerabilities,

How to fix Windows Defender behavior: Win32/Hive.ZY alert May 06, 2023 am 08:04 AM

Many Windows 11 and 10 users are troubled by seeing warning notifications from Windows Defender stating that the threat "Behavior: Win32/Hive.ZY" has been detected. According to reports, this Windows Defender warning or alert is triggered when users try to open some commonly used applications such as Google Chrome or Chromium Edge, Whatsapp, Discord, and Spotify. Even if you have blocked this threat on your PC, it will pop up with a message MicrosoftDefenderAntivi the next time you open this affected application

What services do hive components provide? Nov 18, 2022 am 11:57 AM

The services that the hive component can provide: 1. Convert SQL statements into mapreduce codes; 2. Data can be stored, using HDFS; 3. Data can be calculated, using MapReduce. hive is a data warehouse tool based on Hadoop, used for data extraction, transformation, and loading; hive data warehouse tool can map structured data files into a database table, and provides SQL query functions, which can convert SQL statements into MapReduce tasks to execute.