深入解析MapReduce架构设计与实现原理–读书笔记(4)MR及Partitio
MR解析 Mapper/Reducer封装了应用程序的数据处理逻辑。 所有存储在底层分布式文件系统上的数据均要解释成key/value的形式。并交给MR中的map/reduce函数处理,产生另外一些key/value。 Mapper 1)初始化 Mapper继承了JobConfigurable接口。该config方法允许通
MR解析
Mapper/Reducer封装了应用程序的数据处理逻辑。
所有存储在底层分布式文件系统上的数据均要解释成key/value的形式。并交给MR中的map/reduce函数处理,产生另外一些key/value。
Mapper
1)初始化
Mapper继承了JobConfigurable接口。该config方法允许通过JobConf参数对Mapper进行初始化。
2)Map操作
MapReduce会通过InputFormat中RecordReader从InputSplit获取一个key/value对,并交给map()函数处理:
void map(K1 key,V2 value,OutputCollector
3)清理
Mapper通过继承Colseable获得close方法,用户可通过实现该方法对Mapper进行清理。
Mapper类型
ChainMapper 链式作业;IdentityMapper对于输入不进行任何处理,直接输出;InvertMapper 交换key/value位置;
RegexMapper 正则表达式字符串分割;TokenMapper 将字符串分割成若干个token,可用作wordCount的Mapper;
LongSumReducer:以key为组,对long类型的value求累加和。
新的Mapper由接口变为抽象类;不再继承JobConfigurable和Closeable,而是直接在类中添加了setup和cleanup两个方法进行初始化和清理工作。
将参数封装到Context对象中,接口具有良好扩展性。
去掉MapRunnable接口,在Mapper中添加run方法,以方便用户定制map()函数的调用方法。
新API中,Reducer遍历value的迭代器类型变为Iterable
void reduce(KEYIN key,Iteratable values,Context context) throws IOException,InterrupteException{for(VALUEIN value:values){ context.write((KEYOUT) key,(VALUEOUT) value);}}
Partitioner接口的设计与实现
Partitioner的作用是对Mapper产生的中间结果进行分片,以便将同一分组的数据交给同一个Reducer处理,它直接影响Reduce阶段的负载均衡。
只包含一个待实现的方法getPartition。该方法包含3个参数,均由框架自传入,前面2个参数是key/value,第三个参数numPartitions表示每个Mapper的分片数,
也就是Reducer的个数。
HashPartitioner和TotalOrderPartitioner。其中HashPartitioner是默认实现:public int getPartition(K2 key,V2 value,int numReduceTasks){return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks ;}
TotalOrderPartitioner提供了一种基于区间的分片方法,通常用在数据全排序中,归并排序。
在Map阶段,每个MapTask进行局部排序;在Reduce阶段,启动一个ReduceTask进行全局排序。由于作业只能有一个ReduceTask,因此会产生瓶颈。
TotalOrderPartitioner按照大小将数据分成若干个区间,并保证后一个区间的所有数据均大于前一个区间数据。
步骤1:数据采样。
在client端通过采样获取分片的分割点。
采样数据:b,abc,abd,bcd,abcd,efg,hii,afd,rrr,mnk
排序后:abc,abcd,abd,afd,b,bcd,efg,hii,mnk,rrr
如果有4个Reduce Task,则采样数据的四等分点为abd,bcd,mnk
步骤2:Map阶段。
Mapper可采用IdentityMapper直接将输入数据输出,TotalOrderPartitioner将步骤1中获取的分割点保存到trie树中以便快速定位任意一个记录所在的区间,这样每个
Map Task产生R个区间,且区间中间有序。
步骤3:Reduce阶段。
每个Reducer对分配到的区间数据进行局部排序,最终得到全排序数据。
TotalOrderPartitioner有2个典型应用实例;TeraSort和HBase。
HBase内部数据有序,Region之间也有序。
原文地址:深入解析MapReduce架构设计与实现原理–读书笔记(4)MR及Partitioner, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to implement dual WeChat login on Huawei mobile phones? With the rise of social media, WeChat has become one of the indispensable communication tools in people's daily lives. However, many people may encounter a problem: logging into multiple WeChat accounts at the same time on the same mobile phone. For Huawei mobile phone users, it is not difficult to achieve dual WeChat login. This article will introduce how to achieve dual WeChat login on Huawei mobile phones. First of all, the EMUI system that comes with Huawei mobile phones provides a very convenient function - dual application opening. Through the application dual opening function, users can simultaneously

According to news on April 17, HMD teamed up with the well-known beer brand Heineken and the creative company Bodega to launch a unique flip phone - The Boring Phone. This phone is not only full of innovation in design, but also returns to nature in terms of functionality, aiming to lead people back to real interpersonal interactions and enjoy the pure time of drinking with friends. Boring mobile phone adopts a unique transparent flip design, showing a simple yet elegant aesthetic. It is equipped with a 2.8-inch QVGA display inside and a 1.77-inch display outside, providing users with a basic visual interaction experience. In terms of photography, although it is only equipped with a 30-megapixel camera, it is enough to handle simple daily tasks.

According to news on April 26, ZTE’s 5G portable Wi-Fi U50S is now officially on sale, starting at 899 yuan. In terms of appearance design, ZTE U50S Portable Wi-Fi is simple and stylish, easy to hold and pack. Its size is 159/73/18mm and is easy to carry, allowing you to enjoy 5G high-speed network anytime and anywhere, achieving an unimpeded mobile office and entertainment experience. ZTE 5G portable Wi-Fi U50S supports the advanced Wi-Fi 6 protocol with a peak rate of up to 1800Mbps. It relies on the Snapdragon X55 high-performance 5G platform to provide users with an extremely fast network experience. Not only does it support the 5G dual-mode SA+NSA network environment and Sub-6GHz frequency band, the measured network speed can even reach an astonishing 500Mbps, which is easily satisfactory.

According to news on April 3, Taipower’s upcoming M50 Mini tablet computer is a device with rich functions and powerful performance. This new 8-inch small tablet is equipped with an 8.7-inch IPS screen, providing users with an excellent visual experience. Its metal body design is not only beautiful but also enhances the durability of the device. In terms of performance, the M50Mini is equipped with the Unisoc T606 eight-core processor, which has two A75 cores and six A55 cores, ensuring a smooth and efficient running experience. At the same time, the tablet is also equipped with a 6GB+128GB storage solution and supports 8GB memory expansion, which meets users’ needs for storage and multi-tasking. In terms of battery life, M50Mini is equipped with a 5000mAh battery and supports Ty

According to news on July 12, the Honor Magic V3 series was officially released today, equipped with the new Honor Vision Soothing Oasis eye protection screen. While the screen itself has high specifications and high quality, it also pioneered the introduction of AI active eye protection technology. It is reported that the traditional way to alleviate myopia is "myopia glasses". The power of myopia glasses is evenly distributed to ensure that the central area of sight is imaged on the retina, but the peripheral area is imaged behind the retina. The retina senses that the image is behind, promoting the eye axis direction. grow later, thereby deepening the degree. At present, one of the main ways to alleviate the development of myopia is the "defocus lens". The central area has a normal power, and the peripheral area is adjusted through optical design partitions, so that the image in the peripheral area falls in front of the retina.

How to implement the WeChat clone function on Huawei mobile phones With the popularity of social software and people's increasing emphasis on privacy and security, the WeChat clone function has gradually become the focus of people's attention. The WeChat clone function can help users log in to multiple WeChat accounts on the same mobile phone at the same time, making it easier to manage and use. It is not difficult to implement the WeChat clone function on Huawei mobile phones. You only need to follow the following steps. Step 1: Make sure that the mobile phone system version and WeChat version meet the requirements. First, make sure that your Huawei mobile phone system version has been updated to the latest version, as well as the WeChat App.

Analysis of the role and principle of nohup In Unix and Unix-like operating systems, nohup is a commonly used command that is used to run commands in the background. Even if the user exits the current session or closes the terminal window, the command can still continue to be executed. In this article, we will analyze the function and principle of the nohup command in detail. 1. The role of nohup: Running commands in the background: Through the nohup command, we can let long-running commands continue to execute in the background without being affected by the user exiting the terminal session. This needs to be run

SpringDataJPA is based on the JPA architecture and interacts with the database through mapping, ORM and transaction management. Its repository provides CRUD operations, and derived queries simplify database access. Additionally, it uses lazy loading to only retrieve data when necessary, thus improving performance.
