使用RawComparator加速Hadoop程序
在前面两篇文章[1][2]中我们介绍了Hadoop序列化的相关知识,包括Writable接口与Writable对象以及如何编写定制的Writable类,深入的分析了Writable类序列化之后占用的字节空间以及字节序列的构成。我们指出Hadoop序列化是Hadoop的核心部分之一,了解和分析Wri
在前面两篇文章[1][2]中我们介绍了Hadoop序列化的相关知识,包括Writable接口与Writable对象以及如何编写定制的Writable类,深入的分析了Writable类序列化之后占用的字节空间以及字节序列的构成。我们指出Hadoop序列化是Hadoop的核心部分之一,了解和分析Writable类的相关知识有助于我们理解Hadoop序列化的工作方式以及选择合适的Writable类作为MapReduce的键和值,以达到高效利用磁盘空间以及快速读写对象。因为在数据密集型计算中,在网络数据的传输是影响计算效率的一个重要因素,选择合适的Writable对象不但减小了磁盘空间,而且更重要的是其减小了需要在网络中传输的数据量,从而加快了程序的速度。
在本文中我们介绍另外一种方法加快程序的速度,这就是使用RawComparator加速Hadoop程序。我们知道作为键(Key)的Writable类必须实现WritableComparable接口,以实现对键进行排序的功能。Writable类进行比较时,Hadoop的默认方式是先将序列化后的对象字节流反序列化为对象,然后再进行比较(compareTo方法),比较过程需要一个反序列化的步骤。RawComparator的做法是不进行反序列化,而是在字节流层面进行比较,这样就省下了反序列化过程,从而加速程序的运行。Hadoop自身提供的IntWritable、LongWritabe等类已经实现了这种优化,使这些Writable类作为键进行比较时,直接使用序列化的字节数组进行比较大小,而不用进行反序列化。
RawComparator的实现
在Hadoop中编写Writable的RawComparator一般不直接继承RawComparator类,而是继承RawComparator的子类WritableComparator,因为WritableComparator类为我们提供了一些有用的工具方法,比如从字节数组中读取int、long和vlong等值。下面是上两篇文章中我们定制的MyWritable类的RawComparator实现,定制的MyWritable由两个VLongWritable对组成,为了添加RawComparator功能,Writable类必须实现WritableComparable接口,这里不再展示实现了WritableComparable接口的MyWritableComparable类的全部内容,而只是MyWritableComparable类中Comparator的实现,完整的代码可以在github中找到。
...//omitted for conciseness
/**
* A RawComparator that compares serialized VlongWritable Pair
* compare method decode long value from serialized byte array one by one
*
* @author yoyzhou
*
* */
public static class Comparator extends WritableComparator {
public Comparator() {
super(MyWritableComparable.class);
}
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
int cmp = 1;
//determine how many bytes the first VLong takes
int n1 = WritableUtils.decodeVIntSize(b1[s1]);
int n2 = WritableUtils.decodeVIntSize(b2[s2]);
try {
//read value from VLongWritable byte array
long l11 = readVLong(b1, s1);
long l21 = readVLong(b2, s2);
cmp = l11 > l21 ? 1 : (l11 == l21 ? 0 : -1);
if (cmp != 0) {
return cmp;
} else {
long l12 = readVLong(b1, s1 + n1);
long l22 = readVLong(b2, s2 + n2);
return cmp = l12 > l22 ? 1 : (l12 == l22 ? 0 : -1);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
static { // register this comparator
WritableComparator.define(MyWritableComparable.class, new Comparator());
}
...
通过上面的代码我们可以看到要实现Writable的RawComparator我们只需要重载WritableComparator的public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2)
方法。在我们的例子中,通过从VLongWritable对序列化后字节数组中一个一个的读取VLongWritable的值,再进行比较。
当然编写完compare方法之后,不要忘了为Writable类注册编写的RawComparator类。
总结
为Writable类编写RawComparator必须对Writable本身序列化之后的字节数组有清晰的了解,知道如何从字节数组中读取Writable对象的值,而这正是我们前两篇关于Hadoop序列化和Writable接口的文章所要阐述的内容。
通过以上的三篇文章,我们了解了Hadoop Writable接口,如何编写自己的Writable类,Writable类的字节序列长度与其构成,以及如何为Writable类编写RawComparator来为Hadoop提速。
参考资料
Tom White, Hadoop: The Definitive Guide, 3rd Edition
Hadoop序列化与Writable接口(一)
Hadoop序列化与Writable接口(二)
--EOF--
原文地址:使用RawComparator加速Hadoop程序, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



The default map on the iPhone is Maps, Apple's proprietary geolocation provider. Although the map is getting better, it doesn't work well outside the United States. It has nothing to offer compared to Google Maps. In this article, we discuss the feasible steps to use Google Maps to become the default map on your iPhone. How to Make Google Maps the Default Map in iPhone Setting Google Maps as the default map app on your phone is easier than you think. Follow the steps below – Prerequisite steps – You must have Gmail installed on your phone. Step 1 – Open the AppStore. Step 2 – Search for “Gmail”. Step 3 – Click next to Gmail app

Magnet link is a link method for downloading resources, which is more convenient and efficient than traditional download methods. Magnet links allow you to download resources in a peer-to-peer manner without relying on an intermediary server. This article will introduce how to use magnet links and what to pay attention to. 1. What is a magnet link? A magnet link is a download method based on the P2P (Peer-to-Peer) protocol. Through magnet links, users can directly connect to the publisher of the resource to complete resource sharing and downloading. Compared with traditional downloading methods, magnetic

How to use mdf files and mds files With the continuous advancement of computer technology, we can store and share data in a variety of ways. In the field of digital media, we often encounter some special file formats. In this article, we will discuss a common file format - mdf and mds files, and introduce how to use them. First, we need to understand the meaning of mdf files and mds files. mdf is the extension of the CD/DVD image file, and the mds file is the metadata file of the mdf file.

CrystalDiskMark is a small HDD benchmark tool for hard drives that quickly measures sequential and random read/write speeds. Next, let the editor introduce CrystalDiskMark to you and how to use crystaldiskmark~ 1. Introduction to CrystalDiskMark CrystalDiskMark is a widely used disk performance testing tool used to evaluate the read and write speed and performance of mechanical hard drives and solid-state drives (SSD). Random I/O performance. It is a free Windows application and provides a user-friendly interface and various test modes to evaluate different aspects of hard drive performance and is widely used in hardware reviews

foobar2000 is a software that can listen to music resources at any time. It brings you all kinds of music with lossless sound quality. The enhanced version of the music player allows you to get a more comprehensive and comfortable music experience. Its design concept is to play the advanced audio on the computer The device is transplanted to mobile phones to provide a more convenient and efficient music playback experience. The interface design is simple, clear and easy to use. It adopts a minimalist design style without too many decorations and cumbersome operations to get started quickly. It also supports a variety of skins and Theme, personalize settings according to your own preferences, and create an exclusive music player that supports the playback of multiple audio formats. It also supports the audio gain function to adjust the volume according to your own hearing conditions to avoid hearing damage caused by excessive volume. Next, let me help you

Is the clock app missing from your phone? The date and time will still appear on your iPhone's status bar. However, without the Clock app, you won’t be able to use world clock, stopwatch, alarm clock, and many other features. Therefore, fixing missing clock app should be at the top of your to-do list. These solutions can help you resolve this issue. Fix 1 – Place the Clock App If you mistakenly removed the Clock app from your home screen, you can put the Clock app back in its place. Step 1 – Unlock your iPhone and start swiping to the left until you reach the App Library page. Step 2 – Next, search for “clock” in the search box. Step 3 – When you see “Clock” below in the search results, press and hold it and

NetEase Mailbox, as an email address widely used by Chinese netizens, has always won the trust of users with its stable and efficient services. NetEase Mailbox Master is an email software specially created for mobile phone users. It greatly simplifies the process of sending and receiving emails and makes our email processing more convenient. So how to use NetEase Mailbox Master, and what specific functions it has. Below, the editor of this site will give you a detailed introduction, hoping to help you! First, you can search and download the NetEase Mailbox Master app in the mobile app store. Search for "NetEase Mailbox Master" in App Store or Baidu Mobile Assistant, and then follow the prompts to install it. After the download and installation is completed, we open the NetEase email account and log in. The login interface is as shown below

Cloud storage has become an indispensable part of our daily life and work nowadays. As one of the leading cloud storage services in China, Baidu Netdisk has won the favor of a large number of users with its powerful storage functions, efficient transmission speed and convenient operation experience. And whether you want to back up important files, share information, watch videos online, or listen to music, Baidu Cloud Disk can meet your needs. However, many users may not understand the specific use method of Baidu Netdisk app, so this tutorial will introduce in detail how to use Baidu Netdisk app. Users who are still confused can follow this article to learn more. ! How to use Baidu Cloud Network Disk: 1. Installation First, when downloading and installing Baidu Cloud software, please select the custom installation option.
