Table of Contents
导读
MRunit简介
Home Database Mysql Tutorial MRUnit使用技巧

MRUnit使用技巧

Jun 07, 2016 pm 04:33 PM
for skills

导读 为了能测试编写的hadoop组件和MapReduce程序,一般有下面三种思路: 一、使用hadoop-eclipse插件来调试MapReduce程序,不过这在hadoop比较新的版本里已经不再提供了; 二、是配置jvm参数远程调试hadoop组件。这种方式用于读hadoop源代码比较适合,而如

导读

为了能测试编写的hadoop组件和MapReduce程序,一般有下面三种思路:

一、使用hadoop-eclipse插件来调试MapReduce程序,不过这在hadoop比较新的版本里已经不再提供了;

二、是配置jvm参数远程调试hadoop组件。这种方式用于读hadoop源代码比较适合,而如果用于远程调试MapReduce还是有点麻烦的;

详细参考的文档有:

http://blog.javachen.com/hadoop/2013/08/01/remote-debug-hadoop/

http://zhangjie.me/eclipse-debug-hadoop/

三、最后我选择了MRuinit来用于主要开发调试MapReduce应用程序。

MRunit简介

MRunit是用于做MapReduce单元测试的java库。使用apache发布,下载地址是:http://mrunit.apache.org/general/downloads.html

MRUnit测试框架是基于JUnit的。我们可以方便的测试Map ?Reduce程序。它适用于?0.20 , 0.23.x , 1.0.x , 2.x 等 Hadoop版本。

下面我们来做些MRunit的使用官方例子(SMS CDR (call details record) analysis):

使用记录如下

CDRID;CDRType;Phone1;Phone2;SMS Status Code
655209;1;796764372490213;804422938115889;6
353415;0;356857119806206;287572231184798;4
835699;1;252280313968413;889717902341635;0
Copy after login

需要做的事情是查找所有CDRType 为1的记录和它相关的状态码(SMS Status Code)
Map输出应该是:
6, 1
0, 1

代码如下:

public class SMSCDRMapper extends Mapper {
  private Text status = new Text();
  private final static IntWritable addOne = new IntWritable(1);
  /**
   * Returns the SMS status code and its count
   */
  protected void map(LongWritable key, Text value, Context context)
      throws java.io.IOException, InterruptedException {
    //655209;1;796764372490213;804422938115889;6 is the Sample record format
    String[] line = value.toString().split(";");
    // If record is of SMS CDR
    if (Integer.parseInt(line[1]) == 1) {
      status.set(line[4]);
      context.write(status, addOne);
    }
  }
}
Copy after login

Reduce 程序把最后的结果相加,程序如下:

public class SMSCDRReducer extends
  Reducer {
  protected void reduce(Text key, Iterable values, Context context) throws java.io.IOException, InterruptedException {
    int sum = 0;
    for (IntWritable value : values) {
      sum += value.get();
    }
    context.write(key, new IntWritable(sum));
  }
}
Copy after login

MRunit的测试程序如下:

import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.junit.Before;
import org.junit.Test;
public class SMSCDRMapperReducerTest {
  MapDriver mapDriver;
  ReduceDriver reduceDriver;
  MapReduceDriver mapReduceDriver;
  @Before
  public void setUp() {
    SMSCDRMapper mapper = new SMSCDRMapper();
    SMSCDRReducer reducer = new SMSCDRReducer();
    mapDriver = MapDriver.newMapDriver(mapper);;
    reduceDriver = ReduceDriver.newReduceDriver(reducer);
    mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
  }
  @Test
  public void testMapper() {
    mapDriver.withInput(new LongWritable(), new Text(
        "655209;1;796764372490213;804422938115889;6"));
    mapDriver.withOutput(new Text("6"), new IntWritable(1));
    mapDriver.runTest();
  }
  @Test
  public void testReducer() {
    List values = new ArrayList();
    values.add(new IntWritable(1));
    values.add(new IntWritable(1));
    reduceDriver.withInput(new Text("6"), values);
    reduceDriver.withOutput(new Text("6"), new IntWritable(2));
    reduceDriver.runTest();
  }
}
Copy after login

使用过JUnit的就应该知道怎么运行上面的代码了,这里就不重复了。

MRUint可以测试单个Map,单个Reduce和一个MapReduce或者多个MapReduce程序。
详细的可以参考官网文档:MRUnit Tutorial

参考:http://www.cnblogs.com/gpcuster/archive/2009/10/04/1577921.html

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to use Go language for quantitative financial analysis? How to use Go language for quantitative financial analysis? Jun 11, 2023 am 08:51 AM

In the field of modern finance, with the rise of data science and artificial intelligence technology, quantitative finance has gradually become an increasingly important direction. As a statically typed programming language that can efficiently process data and deploy distributed systems, Go language has gradually attracted attention in the field of quantitative finance. This article will introduce how to use the Go language to perform quantitative financial analysis. The specific content is as follows: Obtaining financial data First, we need to obtain financial data. The network programming capabilities of Go language are very powerful and can be used to obtain various financial data. Compare

How to use Go language for data mining? How to use Go language for data mining? Jun 10, 2023 am 08:39 AM

With the rise of big data and data mining, more and more programming languages ​​have begun to support data mining functions. As a fast, safe and efficient programming language, Go language can also be used for data mining. So, how to use Go language for data mining? Here are some important steps and techniques. Data Acquisition First, you need to obtain the data. This can be achieved through various means, such as crawling information on web pages, using APIs to obtain data, reading data from databases, etc. Go language comes with rich HTTP

How to use PHP to develop simple SEO optimization functions How to use PHP to develop simple SEO optimization functions Sep 20, 2023 pm 04:18 PM

How to use PHP to develop simple SEO optimization functions SEO (SearchEngineOptimization), or search engine optimization, refers to improving the website's ranking in search engines by improving the structure and content of the website, thereby obtaining more organic traffic. In website development, how to use PHP to implement simple SEO optimization functions? This article will introduce some commonly used SEO optimization techniques and specific code examples to help developers implement SEO optimization in PHP projects. 1. Friendly to use

How to write the minimum spanning tree algorithm using C# How to write the minimum spanning tree algorithm using C# Sep 19, 2023 pm 01:55 PM

How to use C# to write the minimum spanning tree algorithm. The minimum spanning tree algorithm is an important graph theory algorithm, which is used to solve the connectivity problem of graphs. In computer science, a minimum spanning tree refers to a spanning tree of a connected graph in which the sum of the weights of all edges of the spanning tree is the smallest. This article will introduce how to use C# to write the minimum spanning tree algorithm and provide specific code examples. First, we need to define a graph data structure to represent the problem. In C#, you can use an adjacency matrix to represent a graph. An adjacency matrix is ​​a two-dimensional array in which each element represents

How to use nginx to prevent hotlinking How to use nginx to prevent hotlinking Jun 11, 2023 pm 01:25 PM

With the popularity of the Internet, more and more websites provide external link functions for pictures, videos and other resources. However, this external link function is easy to be stolen. Hotlinking means that other websites use pictures, videos and other resources on your website to directly display these resources on their own website through the reference address instead of downloading them to their own server. In this way, hotlink websites can use your website's traffic and bandwidth resources for free, which wastes resources and affects website speed. To address this problem, Nginx can be used to prevent hotlinking. Nginx is

Easy solution: A complete guide to pip mirror source usage techniques Easy solution: A complete guide to pip mirror source usage techniques Jan 16, 2024 am 10:31 AM

One-click solution: Quickly master the usage skills of pip mirror source Introduction: pip is the most commonly used package management tool for Python, which can easily install, upgrade and manage Python packages. However, due to well-known reasons, using the default mirror source to download the installation package is slower. In order to solve this problem, we need to use a domestic mirror source. This article will introduce how to quickly master the usage skills of pip mirror source and provide specific code examples. Before you start, understand the concept of pip mirror source.

How to use the divide and conquer algorithm in C++ How to use the divide and conquer algorithm in C++ Sep 20, 2023 pm 03:19 PM

How to use the divide-and-conquer algorithm in C++ The divide-and-conquer algorithm is a method that decomposes a problem into several sub-problems and then combines the solutions to the sub-problems to obtain a solution to the original problem. It has a wide range of applications and can be used to solve various types of problems, including mathematical problems, sorting problems, graph problems, etc. This article will introduce how to use the divide and conquer algorithm in C++ and provide specific code examples. 1. Basic idea The basic idea of ​​the divide-and-conquer algorithm is to decompose a large problem into several smaller sub-problems, solve each sub-problem recursively, and finally merge the sub-problems.

Master the advantages and operating techniques of the conda virtual environment Master the advantages and operating techniques of the conda virtual environment Feb 18, 2024 pm 07:46 PM

To understand the advantages and usage techniques of the conda virtual environment, specific code examples are required. Python is a very popular programming language that is widely used in fields such as scientific computing, data analysis, and artificial intelligence. In the Python ecosystem, there are many third-party libraries and tools, and different versions of the libraries may need to be used in different projects. In order to manage the dependencies of these libraries, the conda virtual environment becomes an important tool. conda is an open source package management system and environment management system that can easily create and

See all articles