Table of Contents
Hive查询
排序和聚集
MapReduce脚本
连接(join)
内连接(inner join)
外连接(outer join)
半连接(semi join)
Map连接(map join)
子查询(sub query)
视图(view)

Hive查询

Jun 07, 2016 pm 03:28 PM
hive or sort supply Inquire pass

Hive查询 排序和聚集 通过Hive提供的order by子句可以让最终的输出结果整体有序。但是因为Hive是基于Hadoop之上的,要生成这种整体有序的结果,就必须强迫Hadoop只利用一个Reduce来完成处理。这种方式的副作用就是回降低效率。 如果你不需要最终结果整体有序

Hive查询

排序和聚集

通过Hive提供的order by子句可以让最终的输出结果整体有序。但是因为Hive是基于Hadoop之上的,要生成这种整体有序的结果,就必须强迫Hadoop只利用一个Reduce来完成处理。这种方式的副作用就是回降低效率。

如果你不需要最终结果整体有序,你就可以使用sort by子句来进行排序。这种排序操作只保证每个Reduce的输出是有序的。如果你希望某些特定行被同一个Reduce处理,则你可以使用distribute子句来完成。比如:

表student(classNo,stuNo,score)数据如下:

C01 N0101 82

C01 N0102 59

C02 N0201 81

C01 N0103 65

C03 N0302 92

C02 N0202 82

C02 N0203 79

C03 N0301 56

C03 N0306 72

我们希望按照成绩由低到高输出每个班级的成绩信息。执行以下语句:

Select classNo,stuNo,score from student distribute byclassNo sort by score;

输出结果为:

C02 N0203 79

C02 N0201 81

C02 N0202 82

C03 N0301 56

C03 N0306 72

C03 N0302 92

C01 N0102 59

C01 N0103 65

C01 N0101 82

我们可以看到每一个班级里所有的学生成绩是有序的。因为同一个classNo的记录会被分发到一个单独的reduce处理,而同时sort by保证了每一个reduce的输出是有序的。

注意:

为了测试上例中的distribute by的效果,你应该首先设置足够多的reduce。比如上例中有3个不同的classNo,则我们需要设置reduce个数至少为3或更多。如果设置的reduce个数少于3,将会导致多个不同的classNo被分发到同一个reduce,从而不能产生你所期望的输出。设置命令如下:

set mapred.reduce.tasks = 3;

MapReduce脚本

如果我们需要在查询语句中调用外部脚本,比如Python,则我们可以使用transform,map,reduce等子句。

比如,我们希望过滤掉所有不及格的学生记录,只输出及格学生的成绩信息。

新建一个Python脚本文件score_pass.py,内容如下:

#! /usr/bin/env python

import sys

for line in sys.stdin:

(classNo,stuNo,score)= line.strip().split('\t')

ifint(score) >= 60:

print"%s\t%s\t%s" %(classNo,stuNo,score)

执行以下语句

add file /home/user/score_pass.py;

select transform(classNo,stuNo,score) using'score_pass.py' as classNo,stuNo,score from student;

输出结果为:

C01 N0101 82

C02 N0201 81

C01 N0103 65

C03 N0302 92

C02 N0202 82

C02 N0203 79

C03 N0306 72

注意:

1) 以上Python脚本中,分隔符只能是制表符(\t)。同样输出的分隔符也必须为制表符。这个是有hive自身决定的,不能更改,不要尝试使用其他分隔符,否则会报错。同时需要调用strip函数,以去除掉行尾的换行符。(或者直接使用不带参数的line.split()代替。

2) 使用脚本前,先使用add file语句注册脚本文件,以便hive将其分发到Hadoop集群。

3) Transfom传递数据到Python脚本,as语句指定输出的列。

连接(join)

直接编程使用Hadoop的MapReduce是一件比较费时的事情。Hive则大大简化了这个操作。

内连接(inner join)

和SQL的内连相似。执行以下语句查询每个学生的编号和教师名:

Select a.stuNo,b.teacherName from student a join teacherb on a.classNo = b.classNo;

输出结果如下:

N0203 Sun

N0202 Sun

N0201 Sun

N0306 Wang

N0301 Wang

N0302 Wang

N0103 Zhang

N0102 Zhang

N0101 Zhang

注意:

数据文件内容请参照上一篇文章。

不要使用select xx from aa bb where aa.f=bb.f这样的语法,hive不支持这种写法。

如果需要查看hive的执行计划,你可以在语句前加上explain,比如:

explain Select a.stuNo,b.teacherName from student a jointeacher b on a.classNo = b.classNo;

外连接(outer join)

和传统SQL类似,Hive提供了left outer join,right outer join,full out join。

半连接(semi join)

Hive不提供in子查询。此时你可以用leftsemi join实现同样的功能。

执行以下语句:

Select * from teacher left semi join student onstudent.classNo = teacher.classNo;

输出结果如下:

C02 Sun

C03 Wang

C01 Zhang

可以看出,C04 Dong没有出现在查询结果中,因为C04在表student中不存在。

注意:

右表(student)中的字段只能出现在on子句中,不能出现在其他地方,比如不能出现在select子句中。

Map连接(map join)

当一个表非常小,足以直接装载到内存中去时,可以使用map连接以提高效率,比如:

Select /*+mapjoin(teacher) */ a.stuNo,b.teacherNamefrom student a join teacher b on a.classNo = b.classNo;

以上红色标记部分采用了C的注释风格。

当连接时用到不等值判断时,也比较适合Map连接。具体原因需要深入了解Hive和MapReduce的工作原理。

子查询(sub query)

运行以下语句将返回所有班级平均分的最高记录。

Select max(avgScore) as maScore

from

(Select classNo,avg(score) as avgScore from student group byclassNo) a;

输出结果:

80.66666666666667

以上语句中红色部分为一个子查询,且别名为a。返回的子查询结果和一个表类似,可以被继续查询。

视图(view)

和传统数据库中的视图类似,Hive的视图只是一个定义,视图数据并不会存储到文件系统中。同样,视图是只读的。

运行以下两个命令:

Create view avg_score as

Select classNo,avg(score) as avgScore from student groupby classNo;

Select max(avgScore) as maScore

From avg_score;

可以看到输出结果和上例中的结果是一样的。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

12306 How to check historical ticket purchase records How to check historical ticket purchase records 12306 How to check historical ticket purchase records How to check historical ticket purchase records Mar 28, 2024 pm 03:11 PM

Download the latest version of 12306 ticket booking app. It is a travel ticket purchasing software that everyone is very satisfied with. It is very convenient to go wherever you want. There are many ticket sources provided in the software. You only need to pass real-name authentication to purchase tickets online. All users You can easily buy travel tickets and air tickets and enjoy different discounts. You can also start booking reservations in advance to grab tickets. You can book hotels or special car transfers. With it, you can go where you want to go and buy tickets with one click. Traveling is simpler and more convenient, making everyone's travel experience more comfortable. Now the editor details it online Provides 12306 users with a way to view historical ticket purchase records. 1. Open Railway 12306, click My in the lower right corner, and click My Order 2. Click Paid on the order page. 3. On the paid page

How to check your academic qualifications on Xuexin.com How to check your academic qualifications on Xuexin.com Mar 28, 2024 pm 04:31 PM

How to check my academic qualifications on Xuexin.com? You can check your academic qualifications on Xuexin.com, but many users don’t know how to check their academic qualifications on Xuexin.com. Next, the editor brings you a graphic tutorial on how to check your academic qualifications on Xuexin.com. Interested users come and take a look! Xuexin.com usage tutorial: How to check your academic qualifications on Xuexin.com 1. Xuexin.com entrance: https://www.chsi.com.cn/ 2. Website query: Step 1: Click on the Xuexin.com address above to enter the homepage Click [Education Query]; Step 2: On the latest webpage, click [Query] as shown by the arrow in the figure below; Step 3: Then click [Login Academic Credit File] on the new page; Step 4: On the login page Enter the information and click [Login];

How to sort WPS scores How to sort WPS scores Mar 20, 2024 am 11:28 AM

In our work, we often use wps software. There are many ways to process data in wps software, and the functions are also very powerful. We often use functions to find averages, summaries, etc. It can be said that as long as The methods that can be used for statistical data have been prepared for everyone in the WPS software library. Below we will introduce the steps of how to sort the scores in WPS. After reading this, you can learn from the experience. 1. First open the table that needs to be ranked. As shown below. 2. Then enter the formula =rank(B2, B2: B5, 0), and be sure to enter 0. As shown below. 3. After entering the formula, press the F4 key on the computer keyboard. This step is to change the relative reference into an absolute reference.

How to check the activation date on Apple mobile phone How to check the activation date on Apple mobile phone Mar 08, 2024 pm 04:07 PM

If you want to check the activation date using an Apple mobile phone, the best way is to check it through the serial number in the mobile phone. You can also check it by visiting Apple's official website, connecting it to a computer, and downloading third-party software to check it. How to check the activation date of Apple mobile phone Answer: Serial number query, Apple official website query, computer query, third-party software query 1. The best way for users is to know the serial number of their mobile phone. You can see the serial number by opening Settings, General, About This Machine. . 2. Using the serial number, you can not only know the activation date of your mobile phone, but also check the mobile phone version, mobile phone origin, mobile phone factory date, etc. 3. Users visit Apple's official website to find technical support, find the service and repair column at the bottom of the page, and check the iPhone activation information there. 4. User

How to sort WPS tables to facilitate data statistics How to sort WPS tables to facilitate data statistics Mar 20, 2024 pm 04:31 PM

WPS is a very complete office software, including text editing, data tables, PPT presentations, PDF formats, flow charts and other functions. Among them, the ones we use most are text, tables, and demonstrations, and they are also the ones we are most familiar with. In our study work, we sometimes use WPS tables to make some data statistics. For example, the school will count the scores of each student. If we have to manually sort the scores of so many students, it will be really a headache. In fact, we don’t have to worry, because our WPS table has a sorting function to solve this problem for us. Next, let’s learn how to sort WPS together. Method steps: Step 1: First we need to open the WPS table that needs to be sorted

How to reorder multiple columns in Power Query via drag and drop How to reorder multiple columns in Power Query via drag and drop Mar 14, 2024 am 10:55 AM

In this article, we will show you how to reorder multiple columns in PowerQuery by dragging and dropping. Often, when importing data from various sources, columns may not be in the desired order. Reordering columns not only allows you to arrange them in a logical order that suits your analysis or reporting needs, it also improves the readability of your data and speeds up tasks such as filtering, sorting, and performing calculations. How to rearrange multiple columns in Excel? There are many ways to rearrange columns in Excel. You can simply select the column header and drag it to the desired location. However, this approach can become cumbersome when dealing with large tables with many columns. To rearrange columns more efficiently, you can use the enhanced query editor. Enhancing the query

Comparison of similarities and differences between MySQL and PL/SQL Comparison of similarities and differences between MySQL and PL/SQL Mar 16, 2024 am 11:15 AM

MySQL and PL/SQL are two different database management systems, representing the characteristics of relational databases and procedural languages ​​respectively. This article will compare the similarities and differences between MySQL and PL/SQL, with specific code examples to illustrate. MySQL is a popular relational database management system that uses Structured Query Language (SQL) to manage and operate databases. PL/SQL is a procedural language unique to Oracle database and is used to write database objects such as stored procedures, triggers and functions. same

Advanced sorting of PHP arrays: custom comparators and anonymous functions Advanced sorting of PHP arrays: custom comparators and anonymous functions Apr 27, 2024 am 11:09 AM

In PHP, there are two ways to sort an array in a custom order: Custom comparator: implement the Comparable interface and specify the comparison rules of the two objects. Anonymous function: Create an anonymous function as a custom comparator to compare two objects against a criterion.

See all articles