Home Database Mysql Tutorial Hadoop Pig Algebraic Interface

Hadoop Pig Algebraic Interface

Jun 07, 2016 pm 04:30 PM
hadoop interface pig

仔细看了一下hadoop pig 的udf 文档 在 Algebraic interface 设计上还是可以学习的。 一些聚合函数,如 SUM, COUNT 都得实现 Algebraic 接口 此接口要实现 三个方法,这三个方法都是返回具体实现的 class name 并且这些 class name都要实现 exec方法 public

仔细看了一下hadoop pig 的udf 文档 在 Algebraic interface 设计上还是可以学习的。

一些聚合函数,如 SUM, COUNT 都得实现 Algebraic 接口

此接口要实现 三个方法,这三个方法都是返回具体实现的 class name

并且这些 class name都要实现 exec方法

1

2

3

4

5

6

<code>    public interface Algebraic{

            public String getInitial();

            public String getIntermed();

            public String getFinal();

    }

</code>

Copy after login

看 pig built in COUNT 的实现

这几个方法都可以对应对相关的hadoop 的map combine,reduce

map 对应 Initial

combine 对应 Intermed

reduce 对应 reduce

发现 java 的内部静态内还是很有用的

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

<code>public class COUNT extends EvalFunc<long> implements Algebraic{

    public Long exec(Tuple input) throws IOException {return count(input);}

    public String getInitial() {return Initial.class.getName();}

    public String getIntermed() {return Intermed.class.getName();}

    public String getFinal() {return Final.class.getName();}

    static public class Initial extends EvalFunc<tuple> {

            public Tuple exec(Tuple input) throws IOException {return

                    TupleFactory.getInstance().newTuple(count(input));}

    }

    static public class Intermed extends EvalFunc<tuple> {

            public Tuple exec(Tuple input) throws IOException {return

                    TupleFactory.getInstance().newTuple(sum(input));}

    }

    static public class Final extends EvalFunc<long> {

            public Tuple exec(Tuple input) throws IOException {return sum(input);}

    }

    static protected Long count(Tuple input) throws ExecException {

            Object values = input.get(0);

            if (values instanceof DataBag) return ((DataBag)values).size();

            else if (values instanceof Map) return new Long(((Map)values).size());

    }

    static protected Long sum(Tuple input) throws ExecException, NumberFormatException {

            DataBag values = (DataBag)input.get(0);

            long sum = 0;

            for (Iterator (Tuple) it = values.iterator(); it.hasNext();) {

                    Tuple t = it.next();

                    sum += (Long)t.get(0);

            }

            return sum;

    }

}

</long></tuple></tuple></long></code>

Copy after login
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Java Errors: Hadoop Errors, How to Handle and Avoid Java Errors: Hadoop Errors, How to Handle and Avoid Jun 24, 2023 pm 01:06 PM

Java Errors: Hadoop Errors, How to Handle and Avoid When using Hadoop to process big data, you often encounter some Java exception errors, which may affect the execution of tasks and cause data processing to fail. This article will introduce some common Hadoop errors and provide ways to deal with and avoid them. Java.lang.OutOfMemoryErrorOutOfMemoryError is an error caused by insufficient memory of the Java virtual machine. When Hadoop is

Using Hadoop and HBase in Beego for big data storage and querying Using Hadoop and HBase in Beego for big data storage and querying Jun 22, 2023 am 10:21 AM

With the advent of the big data era, data processing and storage have become more and more important, and how to efficiently manage and analyze large amounts of data has become a challenge for enterprises. Hadoop and HBase, two projects of the Apache Foundation, provide a solution for big data storage and analysis. This article will introduce how to use Hadoop and HBase in Beego for big data storage and query. 1. Introduction to Hadoop and HBase Hadoop is an open source distributed storage and computing system that can

How to use PHP and Hadoop for big data processing How to use PHP and Hadoop for big data processing Jun 19, 2023 pm 02:24 PM

As the amount of data continues to increase, traditional data processing methods can no longer handle the challenges brought by the big data era. Hadoop is an open source distributed computing framework that solves the performance bottleneck problem caused by single-node servers in big data processing through distributed storage and processing of large amounts of data. PHP is a scripting language that is widely used in web development and has the advantages of rapid development and easy maintenance. This article will introduce how to use PHP and Hadoop for big data processing. What is HadoopHadoop is

Explore the application of Java in the field of big data: understanding of Hadoop, Spark, Kafka and other technology stacks Explore the application of Java in the field of big data: understanding of Hadoop, Spark, Kafka and other technology stacks Dec 26, 2023 pm 02:57 PM

Java big data technology stack: Understand the application of Java in the field of big data, such as Hadoop, Spark, Kafka, etc. As the amount of data continues to increase, big data technology has become a hot topic in today's Internet era. In the field of big data, we often hear the names of Hadoop, Spark, Kafka and other technologies. These technologies play a vital role, and Java, as a widely used programming language, also plays a huge role in the field of big data. This article will focus on the application of Java in large

A thorough understanding of Go language decryption interface interface A thorough understanding of Go language decryption interface interface Aug 08, 2023 pm 04:37 PM

In the semantics of the Go language, as long as a type implements a defined set of methods, it is considered to be the same type and the same thing. People often call it duck typing because it is relatively consistent with the definition of duck typing.

How to install Hadoop in linux How to install Hadoop in linux May 18, 2023 pm 08:19 PM

1: Install JDK1. Execute the following command to download the JDK1.8 installation package. wget--no-check-certificatehttps://repo.huaweicloud.com/java/jdk/8u151-b12/jdk-8u151-linux-x64.tar.gz2. Execute the following command to decompress the downloaded JDK1.8 installation package. tar-zxvfjdk-8u151-linux-x64.tar.gz3. Move and rename the JDK package. mvjdk1.8.0_151//usr/java84. Configure Java environment variables. echo'

How does PHP8 use Stringable Interface to handle various types of strings? How does PHP8 use Stringable Interface to handle various types of strings? Oct 18, 2023 am 11:33 AM

How does PHP8 use StringableInterface to handle various types of strings? PHP8 introduces a new interface Stringable, which can help developers process various types of strings more conveniently. In the past, we usually used the is_string() function to determine whether a variable is of string type, and then perform corresponding operations. Now, with the Stringable interface, we can handle strings more intuitively without having to determine their type.

Use PHP to achieve large-scale data processing: Hadoop, Spark, Flink, etc. Use PHP to achieve large-scale data processing: Hadoop, Spark, Flink, etc. May 11, 2023 pm 04:13 PM

As the amount of data continues to increase, large-scale data processing has become a problem that enterprises must face and solve. Traditional relational databases can no longer meet this demand. For the storage and analysis of large-scale data, distributed computing platforms such as Hadoop, Spark, and Flink have become the best choices. In the selection process of data processing tools, PHP is becoming more and more popular among developers as a language that is easy to develop and maintain. In this article, we will explore how to leverage PHP for large-scale data processing and how

See all articles