Home Database Mysql Tutorial Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

Jun 07, 2016 pm 02:51 PM
hive function analyze window

1.GROUPING SETS与另外哪种方式等价? 2.根据GROUP BY的维度的所有组合进行聚合由哪个关键字完成? 3.ROLLUP与ROLLUP关系是什么? GROUPING SETS,GROUPING__ID,CUBE,ROLLUP这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统

1.GROUPING SETS与另外哪种方式等价?
2.根据GROUP BY的维度的所有组合进行聚合由哪个关键字完成?

3.ROLLUP与ROLLUP关系是什么?


GROUPING SETS,GROUPING__ID,CUBE,ROLLUP 这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统计,比如,分小时、天、月的UV数。 Hive版本为 apache-hive-0.13.1 数据准备:
    2015-03,2015-03-10,cookie1
    2015-03,2015-03-10,cookie5
    2015-03,2015-03-12,cookie7
    2015-04,2015-04-12,cookie3
    2015-04,2015-04-13,cookie2
    2015-04,2015-04-13,cookie4
    2015-04,2015-04-16,cookie4
    2015-03,2015-03-10,cookie2
    2015-03,2015-03-10,cookie3
    2015-04,2015-04-12,cookie5
    2015-04,2015-04-13,cookie6
    2015-04,2015-04-15,cookie3
    2015-04,2015-04-15,cookie2
    2015-04,2015-04-16,cookie1

    CREATE EXTERNAL TABLE lxw1234 (
    month STRING,
    day STRING,
    cookieid STRING
    ) ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    stored as textfile location '/tmp/lxw11/';


    hive> select * from lxw1234;
    OK
    2015-03 2015-03-10      cookie1
    2015-03 2015-03-10      cookie5
    2015-03 2015-03-12      cookie7
    2015-04 2015-04-12      cookie3
    2015-04 2015-04-13      cookie2
    2015-04 2015-04-13      cookie4
    2015-04 2015-04-16      cookie4
    2015-03 2015-03-10      cookie2
    2015-03 2015-03-10      cookie3
    2015-04 2015-04-12      cookie5
    2015-04 2015-04-13      cookie6
    2015-04 2015-04-15      cookie3
    2015-04 2015-04-15      cookie2
    2015-04 2015-04-16      cookie1
Copy after login

GROUPING SETS
在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM lxw1234
    GROUP BY month,day
    GROUPING SETS (month,day)
    ORDER BY GROUPING__ID;

    month      day            uv      GROUPING__ID
    ------------------------------------------------
    2015-03    NULL            5       1
    2015-04    NULL            6       1
    NULL       2015-03-10      4       2
    NULL       2015-03-12      1       2
    NULL       2015-04-12      2       2
    NULL       2015-04-13      3       2
    NULL       2015-04-15      2       2
    NULL       2015-04-16      2       2


    等价于
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
    UNION ALL
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day
Copy after login

再如:
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM lxw1234
    GROUP BY month,day
    GROUPING SETS (month,day,(month,day))
    ORDER BY GROUPING__ID;

    month         day             uv      GROUPING__ID
    ------------------------------------------------
    2015-03       NULL            5       1
    2015-04       NULL            6       1
    NULL          2015-03-10      4       2
    NULL          2015-03-12      1       2
    NULL          2015-04-12      2       2
    NULL          2015-04-13      3       2
    NULL          2015-04-15      2       2
    NULL          2015-04-16      2       2
    2015-03       2015-03-10      4       3
    2015-03       2015-03-12      1       3
    2015-04       2015-04-12      2       3
    2015-04       2015-04-13      3       3
    2015-04       2015-04-15      2       3
    2015-04       2015-04-16      2       3


    等价于
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
    UNION ALL
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day
    UNION ALL
    SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day
Copy after login

其中的 GROUPING__ID,表示结果属于哪一个分组集合。

CUBE
根据GROUP BY的维度的所有组合进行聚合。
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM lxw1234
    GROUP BY month,day
    WITH CUBE
    ORDER BY GROUPING__ID;


    month                              day             uv     GROUPING__ID
    --------------------------------------------
    NULL            NULL            7       0
    2015-03         NULL            5       1
    2015-04         NULL            6       1
    NULL            2015-04-12      2       2
    NULL            2015-04-13      3       2
    NULL            2015-04-15      2       2
    NULL            2015-04-16      2       2
    NULL            2015-03-10      4       2
    NULL            2015-03-12      1       2
    2015-03         2015-03-10      4       3
    2015-03         2015-03-12      1       3
    2015-04         2015-04-16      2       3
    2015-04         2015-04-12      2       3
    2015-04         2015-04-13      3       3
    2015-04         2015-04-15      2       3



    等价于
    SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM lxw1234
    UNION ALL
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
    UNION ALL
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day
    UNION ALL
    SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day
Copy after login

ROLLUP
是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。
    比如,以month维度进行层级聚合:
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID  
    FROM lxw1234
    GROUP BY month,day
    WITH ROLLUP
    ORDER BY GROUPING__ID;

    month                              day             uv     GROUPING__ID
    ---------------------------------------------------
    NULL             NULL            7       0
    2015-03          NULL            5       1
    2015-04          NULL            6       1
    2015-03          2015-03-10      4       3
    2015-03          2015-03-12      1       3
    2015-04          2015-04-12      2       3
    2015-04          2015-04-13      3       3
    2015-04          2015-04-15      2       3
    2015-04          2015-04-16      2       3

    可以实现这样的上钻过程:
    月天的UV->月的UV->总UV

复制代码

    --把month和day调换顺序,则以day维度进行层级聚合:

    SELECT
    day,
    month,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID  
    FROM lxw1234
    GROUP BY day,month
    WITH ROLLUP
    ORDER BY GROUPING__ID;


    day                                month              uv     GROUPING__ID
    -------------------------------------------------------
    NULL            NULL               7       0
    2015-04-13      NULL               3       1
    2015-03-12      NULL               1       1
    2015-04-15      NULL               2       1
    2015-03-10      NULL               4       1
    2015-04-16      NULL               2       1
    2015-04-12      NULL               2       1
    2015-04-12      2015-04            2       3
    2015-03-10      2015-03            4       3
    2015-03-12      2015-03            1       3
    2015-04-13      2015-04            3       3
    2015-04-15      2015-04            2       3
    2015-04-16      2015-04            2       3

    可以实现这样的上钻过程:
    天月的UV->天的UV->总UV
    (这里,根据天和月进行聚合,和根据天聚合结果一样,因为有父子关系,如果是其他维度组合的话,就会不一样)
Copy after login


Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Tips for dynamically creating new functions in golang functions Tips for dynamically creating new functions in golang functions Apr 25, 2024 pm 02:39 PM

Go language provides two dynamic function creation technologies: closure and reflection. closures allow access to variables within the closure scope, and reflection can create new functions using the FuncOf function. These technologies are useful in customizing HTTP routers, implementing highly customizable systems, and building pluggable components.

Considerations for parameter order in C++ function naming Considerations for parameter order in C++ function naming Apr 24, 2024 pm 04:21 PM

In C++ function naming, it is crucial to consider parameter order to improve readability, reduce errors, and facilitate refactoring. Common parameter order conventions include: action-object, object-action, semantic meaning, and standard library compliance. The optimal order depends on the purpose of the function, parameter types, potential confusion, and language conventions.

Complete collection of excel function formulas Complete collection of excel function formulas May 07, 2024 pm 12:04 PM

1. The SUM function is used to sum the numbers in a column or a group of cells, for example: =SUM(A1:J10). 2. The AVERAGE function is used to calculate the average of the numbers in a column or a group of cells, for example: =AVERAGE(A1:A10). 3. COUNT function, used to count the number of numbers or text in a column or a group of cells, for example: =COUNT(A1:A10) 4. IF function, used to make logical judgments based on specified conditions and return the corresponding result.

How to write efficient and maintainable functions in Java? How to write efficient and maintainable functions in Java? Apr 24, 2024 am 11:33 AM

The key to writing efficient and maintainable Java functions is: keep it simple. Use meaningful naming. Handle special situations. Use appropriate visibility.

Comparison of the advantages and disadvantages of C++ function default parameters and variable parameters Comparison of the advantages and disadvantages of C++ function default parameters and variable parameters Apr 21, 2024 am 10:21 AM

The advantages of default parameters in C++ functions include simplifying calls, enhancing readability, and avoiding errors. The disadvantages are limited flexibility and naming restrictions. Advantages of variadic parameters include unlimited flexibility and dynamic binding. Disadvantages include greater complexity, implicit type conversions, and difficulty in debugging.

What are the benefits of C++ functions returning reference types? What are the benefits of C++ functions returning reference types? Apr 20, 2024 pm 09:12 PM

The benefits of functions returning reference types in C++ include: Performance improvements: Passing by reference avoids object copying, thus saving memory and time. Direct modification: The caller can directly modify the returned reference object without reassigning it. Code simplicity: Passing by reference simplifies the code and requires no additional assignment operations.

What is the difference between custom PHP functions and predefined functions? What is the difference between custom PHP functions and predefined functions? Apr 22, 2024 pm 02:21 PM

The difference between custom PHP functions and predefined functions is: Scope: Custom functions are limited to the scope of their definition, while predefined functions are accessible throughout the script. How to define: Custom functions are defined using the function keyword, while predefined functions are defined by the PHP kernel. Parameter passing: Custom functions receive parameters, while predefined functions may not require parameters. Extensibility: Custom functions can be created as needed, while predefined functions are built-in and cannot be modified.

Advanced usage of reference parameters and pointer parameters in C++ functions Advanced usage of reference parameters and pointer parameters in C++ functions Apr 21, 2024 am 09:39 AM

Reference parameters in C++ functions (essentially variable aliases, modifying the reference modifies the original variable) and pointer parameters (storing the memory address of the original variable, modifying the variable by dereferencing the pointer) have different usages when passing and modifying variables. Reference parameters are often used to modify original variables (especially large structures) to avoid copy overhead when passed to constructors or assignment operators. Pointer parameters are used to flexibly point to memory locations, implement dynamic data structures, or pass null pointers to represent optional parameters.

See all articles