小猿圈之Hadoop优化 - 程序员的小结

Home > List of blog posts > 小猿圈之Hadoop优化

Blogger Information

Blog 75

fans 0

comment 0

visits 55220

Special Recommendation

More>

Related recommendations

Related Tutorials

Popular Recommendations

Latest courses

The latest ThinkPHP 5.1 world premiere video tutorial (60 days to become a PHP expert online training course)

1421163 times of learning
Collection
PHP introductory tutorial one: Learn PHP in one week

4264821 times of learning
Collection
JAVA Beginner's Video Tutorial

2515535 times of learning
Collection

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template

小猿圈之Hadoop优化

聆听的博客

Original

549 people have browsed it

Hadoop框架是现在最主流的的框架之一，越来越多的人去学习，那么你对hadoop的理解是什么？hadoop一定要会优化，那怎么优化呢，小猿圈今天说一下，感兴趣的朋友可以看看小猿圈写的这篇文章。

1、mr程序的效率瓶颈

功能：分布式离线计算

计算机性能：CPU、内存、磁盘、网络

I/O操作优化a

（1）数据倾斜（代码优化）

（2）map和reduce数设置不合理

（3）map运行时间太长，导致reduce等待过久

（4）小文件过多（combineTextInputFomrat小文件合并）

（5）不可分块的超大文件（不断的溢写）

（6）多个溢写小文件需要多次merge

2、mr优化方法

六个方面考虑：数据输入、Map阶段、Reduce阶段、IO传输、

数据倾斜、参数调优

1>数据输入

（1）合并小文件：在执行mr任务前就进行小文件合并

（2）采用CombineTextInputFormat来作为输入，解决输入端大量小文件的场景

mr并不适合处理大量小文件

2>Map阶段

（1）减少溢写次数（增加内存200M 80%）

实例

    <property>

            <name>mapreduce.task.io.sort.mb</name>

            <value>100</value>

        </property>

        <property>

            <name>mapreduce.map.sort.spill.percent</name>

            <value>0.80</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

(2)减少合并次数

实例

 <property>

            <name>mapreduce.task.io.sort.factor</name>

            <value>10</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

（3）在map之后，不影响业务逻辑情况下进行combiner

3>Reduce阶段

（1）合理设置map与reduce个数

（2）设置map/reduce共存

设置运行一定程度的map运行后启动reduce减少等待时间

实例

 <property>

            <name>mapreduce.job.reduce.slowstart.completedmaps</name>

            <value>0.05</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

（3）合理设置reduce端的buffer

实例

 <property>

            <name>mapreduce.reduce.markreset.buffer.percent</name>

            <value>0.0</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

4>传输

（1）进行数据压缩

（2）使用sequenceFile

5>数据倾斜

（1）进行范围分区

（2）自定义分区

（3）Combine

(4)能用mapjoin坚决不用reduce join

6>参数调优

设置核心数

map核心数设置：

实例

  <property>

            <name>mapreduce.map.cpu.vcores</name>

            <value>1</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

reduce核心数设置：

实例

   <property>

            <name>mapreduce.reduce.cpu.vcores</name>

            <value>1</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

设置内存

maptask内存设置：

实例

 <property>

            <name>mapreduce.map.memory.mb</name>

            <value>1024</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

reducetask内存设置：

实例

   <property>

            <name>mapreduce.reduce.memory.mb</name>

            <value>1024</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

reduce去map端拿数据并行度

实例

  <property>

            <name>mapreduce.reduce.shuffle.parallelcopies</name>

            <value>5</value>

        </property>

运行实例 »

点击 "运行实例" 按钮查看在线实例

hadoop优化小猿圈说了几个优化的方面，大家感觉***？如果有其他方面的优化方法，可以给小猿圈补充，感觉还不错的话，可以去小猿圈学习其他方面内容，希望大家会学到更多全面的内容。

Statement of this Website

The copyright of this blog article belongs to the blogger. Please specify the address when reprinting! If there is any infringement or violation of the law, please contact admin@php.cn Report processing!

All comments Speak rationally on civilized internet, please comply with News Comment Service Agreement

0 comments

Author's latest blog post

小猿圈分享中国最具影响力的8名程序员

2019-06-17 15:18:58