Table of Contents
1、hbase压缩与编码的配置
2、相关测试
3、总体结论分析
Home Database Mysql Tutorial HBase实战系列1—压缩与编码技术

HBase实战系列1—压缩与编码技术

Jun 07, 2016 pm 04:30 PM
hbase compression Actual combat technology series coding

1、hbase压缩与编码的配置 安装LZO 解决方案: 1)apt-get install liblzo2-dev 2)hadoop-gpl-compression-0.2.0-dev.jar 放入classpath 把libgpl下的共享库文件放入/opt/hbase/hbase/lib/native/Linux-amd64-64/ libgplcompression.a libgplcompression.la

1、hbase压缩与编码的配置

安装LZO

解决方案:
1)apt-get install liblzo2-dev
2)hadoop-gpl-compression-0.2.0-dev.jar 放入classpath
把libgpl下的共享库文件放入/opt/hbase/hbase/lib/native/Linux-amd64-64/
libgplcompression.a libgplcompression.la libgplcompression.so libgplcompression.so.0 libgplcompression.so.0.0.0
3)配置:

io.compression.codecs
com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec


io.compression.codec.lzo.class
com.hadoop.compression.lzo.LzoCodec

4)测试:
hbase org.apache.hadoop.hbase.util.CompressionTest hdfs:///user.dat lzo

创建表格时,针对ColumnFamily设置压缩和编码方式。

HColumnDescriptor.setCompressionType(Compression.Algorithm.NONE);

HColumnDescriptor.setDataBlockEncoding(DataBlockEncoding.NONE);

使用FAST_DIFF 与 LZO之后的压缩情况:

hbase@GS-WDE-SEV0151:/opt/hbase/hbase$ hadoop fs -dus /hbase-weibo/weibo_test
hdfs://hbase-hdfs.goso.cn:9000/hbase-weibo/weibo_test???? 1021877013
hbase@GS-WDE-SEV0151:/opt/hbase/hbase$ hadoop fs -dus /hbase-weibo/weibo_lzo
hdfs://hbase-hdfs.goso.cn:9000/hbase-weibo/weibo_lzo???? 1179175365
hbase@GS-WDE-SEV0151:/opt/hbase/ops$ hadoop fs -dus /hbase-weibo/weibo_diff
hdfs://hbase-hdfs.goso.cn:9000/hbase-weibo/weibo_diff???? 2754679243

hbase@GS-WDE-SEV0151:/opt/hbase/hbase$ hadoop fs -dus /hbase-weibo/weibo-new
hdfs://hbase-hdfs.goso.cn:9000/hbase-weibo/weibo-new???? 5270708315

忽略数据中出现的Delete的数据、多个版本、以及超时的数据,压缩比达到1:5。

单独使用LZO的配置的压缩可接近也接近5:1的压缩比。

单独使用FAST_DIFF编码可以接近5:2的压缩比。

HBase操作过程:

Finish DataBlock–>Encoding DataBlock(FAST_DIFF\PREFIX\PREFIX_TRIE\DIFF)—>Compression DataBlock(LZO\GZ) —>Flush到磁盘。

如果Encoding和Compression的方式都设置NONE,中间的过程即可忽略。

2、相关测试

weibo-new使用的NONE、NONE

weibo_test使用的LZO、FAST_DIFF

weibo_diff使用了FAST_DIFF

weibo_lzo使用了LZO压缩

1、测试 扫描的效率:

个数 耗时
weibo_test 2314054 ??3m49.661s
weibo-new 2314054 ??1m55.349s
weibo_lzo 2314054 ? 3m24.378s
weibo_diff 2314054 ?4m41.792s

结果分析:

使用LZO压缩或者FAST_DIFF的编码,扫描时造成大概一倍的开销

这个原因在于:在当前存储容量下,网络IO不是瓶颈,使用基本配置weibo-new吞吐量达到了45.68MB/s,而使用LZO压缩,显然经过一次或者两次解码之后,消耗了一些CPU时间片,从而耗时较长。


2、随机读的效率,采用单条随机的办法

首先scan出所有的Row,然后,使用shuf -n1000000 /tmp/row 随机取出1000000个row,然后按照单线程随机读的方式获取。

ps:每个Record有3个ColumnFamily,共有31个Column。

个数 耗时
weibo_test 100,0000 122min12s, 平均7.3ms/Record
weibo-new 100,0000 68min40s, 平均3.99ms/Record
weibo_lzo 100,0000 83m26.539s, 平均5.00ms/Record
weibo_diff 100,0000 58m5.915s, ?平均3.48ms/Record

结果分析:

1)LZO解压缩的效率低于反解码的效率,在不以存储代价为第一考虑的情况下,优先选择FAST_DIFF编码方式。

2)LZO随机读会引起 hbase内部更多的读开销。下图在读取同样数据过程中,通过对于RegionServer上scanner采集到的读取个数,lzo明显代价较大。

3)在数据量不超过1T,并且HBase集群内存可以完全cover住整个Cache的情况下,可以不做压缩或者编码的设置,一般带有ROWCOL的bloomfilter基本就可以达到系统最佳的状态。如果数据远远大于Cache总量的10倍以上,优先使用编码方案(FAST_DIFF或者0.96引入的PREFIX_TRIE)

3、随机写的效率,采用批量写。批量个数为100

个数 耗时
weibo_test 8640447 571670ms, 66μs/Put, 6.61ms/batch
weibo-new 8640447 329694ms,38.12μs/Put,? 3.81ms/batch
weibo_lzo 8640447 295770ms, 34.23μs/Put, 3.42ms/batch
weibo_diff 8640447 250399ms, 28.97μs/Put,2.90ms/batch

lz vs diff 写操作的集群吞吐图(两者开始执行的时间起点不同, 绿线代表weibo_diff、红线是weibo_lzo)

?

结论分析:

1)批量写操作,使用FAST_DIFF编码的开销最小,性能比不做任何配置(weibo-new)有24%提升。

2)使用diff,lzo双重配置,批量写操作有较大开销,并且压缩没有比单独使用LZO压缩有明显提升,所以不建议同时使用。

3、总体结论分析

1)在column较多、并且value较短的情况下,使用FAST_DIFF可以获得较好的压缩空间以及较优的读写延迟。推荐使用。

2)在对于存储空间比较紧缺的应用,单独使用LZO压缩,可以在牺牲一些随机读的前提下获得较高的空间压缩率(5:1)。

备注:本系列文章属于Binos_ICT在Binospace个人技术博客原创,原文链接为http://www.binospace.com/index.php/hbase-combat-series-1-compression-and-coding-techniques/?,未经允许,不得在网上转载。

From Binospace, post HBase实战系列1—压缩与编码技术

文章的脚注信息由WordPress的wp-posturl插件自动生成


Copyright © 2008
This feed is for personal, non-commercial use only.
The use of this feed on other websites breaches copyright. If this content is not in your news reader, it makes the page you are viewing an infringement of the copyright. (Digital Fingerprint:
)
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

This article is enough for you to read about autonomous driving and trajectory prediction! This article is enough for you to read about autonomous driving and trajectory prediction! Feb 28, 2024 pm 07:20 PM

Trajectory prediction plays an important role in autonomous driving. Autonomous driving trajectory prediction refers to predicting the future driving trajectory of the vehicle by analyzing various data during the vehicle's driving process. As the core module of autonomous driving, the quality of trajectory prediction is crucial to downstream planning control. The trajectory prediction task has a rich technology stack and requires familiarity with autonomous driving dynamic/static perception, high-precision maps, lane lines, neural network architecture (CNN&GNN&Transformer) skills, etc. It is very difficult to get started! Many fans hope to get started with trajectory prediction as soon as possible and avoid pitfalls. Today I will take stock of some common problems and introductory learning methods for trajectory prediction! Introductory related knowledge 1. Are the preview papers in order? A: Look at the survey first, p

The Stable Diffusion 3 paper is finally released, and the architectural details are revealed. Will it help to reproduce Sora? The Stable Diffusion 3 paper is finally released, and the architectural details are revealed. Will it help to reproduce Sora? Mar 06, 2024 pm 05:34 PM

StableDiffusion3’s paper is finally here! This model was released two weeks ago and uses the same DiT (DiffusionTransformer) architecture as Sora. It caused quite a stir once it was released. Compared with the previous version, the quality of the images generated by StableDiffusion3 has been significantly improved. It now supports multi-theme prompts, and the text writing effect has also been improved, and garbled characters no longer appear. StabilityAI pointed out that StableDiffusion3 is a series of models with parameter sizes ranging from 800M to 8B. This parameter range means that the model can be run directly on many portable devices, significantly reducing the use of AI

7-zip maximum compression rate setting, how to compress 7zip to the minimum 7-zip maximum compression rate setting, how to compress 7zip to the minimum Jun 18, 2024 pm 06:12 PM

I found that the compressed package downloaded from a download website will be larger than the original compressed package after decompression. The difference is tens of Kb for a small one and several dozen Mb for a large one. If it is uploaded to a cloud disk or paid space, it does not matter if the file is small. , if there are many files, the storage cost will be greatly increased. I studied it specifically and can learn from it if necessary. Compression level: 9-Extreme compression Dictionary size: 256 or 384, the more compressed the dictionary, the slower it is. The compression rate difference is larger before 256MB, and there is no difference in compression rate after 384MB. Word size: maximum 273 Parameters: f=BCJ2, test and add parameter compression rate will be higher

DualBEV: significantly surpassing BEVFormer and BEVDet4D, open the book! DualBEV: significantly surpassing BEVFormer and BEVDet4D, open the book! Mar 21, 2024 pm 05:21 PM

This paper explores the problem of accurately detecting objects from different viewing angles (such as perspective and bird's-eye view) in autonomous driving, especially how to effectively transform features from perspective (PV) to bird's-eye view (BEV) space. Transformation is implemented via the Visual Transformation (VT) module. Existing methods are broadly divided into two strategies: 2D to 3D and 3D to 2D conversion. 2D-to-3D methods improve dense 2D features by predicting depth probabilities, but the inherent uncertainty of depth predictions, especially in distant regions, may introduce inaccuracies. While 3D to 2D methods usually use 3D queries to sample 2D features and learn the attention weights of the correspondence between 3D and 2D features through a Transformer, which increases the computational and deployment time.

Xiaomi 15 series full codenames revealed: Dada, Haotian, Xuanyuan Xiaomi 15 series full codenames revealed: Dada, Haotian, Xuanyuan Aug 22, 2024 pm 06:47 PM

The Xiaomi Mi 15 series is expected to be officially released in October, and its full series codenames have been exposed in the foreign media MiCode code base. Among them, the flagship Xiaomi Mi 15 Ultra is codenamed "Xuanyuan" (meaning "Xuanyuan"). This name comes from the Yellow Emperor in Chinese mythology, which symbolizes nobility. Xiaomi 15 is codenamed "Dada", while Xiaomi 15Pro is named "Haotian" (meaning "Haotian"). The internal code name of Xiaomi Mi 15S Pro is "dijun", which alludes to Emperor Jun, the creator god of "The Classic of Mountains and Seas". Xiaomi 15Ultra series covers

PHP Practical: Code Example to Quickly Implement Fibonacci Sequence PHP Practical: Code Example to Quickly Implement Fibonacci Sequence Mar 20, 2024 pm 02:24 PM

PHP Practice: Code Example to Quickly Implement the Fibonacci Sequence The Fibonacci Sequence is a very interesting and common sequence in mathematics. It is defined as follows: the first and second numbers are 0 and 1, and from the third Starting with numbers, each number is the sum of the previous two numbers. The first few numbers in the Fibonacci sequence are 0,1,1.2,3,5,8,13,21,...and so on. In PHP, we can generate the Fibonacci sequence through recursion and iteration. Below we will show these two

The best time to buy Huawei Mate 60 series, new AI elimination + image upgrade, and enjoy autumn promotions The best time to buy Huawei Mate 60 series, new AI elimination + image upgrade, and enjoy autumn promotions Aug 29, 2024 pm 03:33 PM

Since the Huawei Mate60 series went on sale last year, I personally have been using the Mate60Pro as my main phone. In nearly a year, Huawei Mate60Pro has undergone multiple OTA upgrades, and the overall experience has been significantly improved, giving people a feeling of being constantly new. For example, recently, the Huawei Mate60 series has once again received a major upgrade in imaging capabilities. The first is the new AI elimination function, which can intelligently eliminate passers-by and debris and automatically fill in the blank areas; secondly, the color accuracy and telephoto clarity of the main camera have been significantly upgraded. Considering that it is the back-to-school season, Huawei Mate60 series has also launched an autumn promotion: you can enjoy a discount of up to 800 yuan when purchasing the phone, and the starting price is as low as 4,999 yuan. Commonly used and often new products with great value

How to compress a folder and send it in wps How to compress a folder and send it in wps Mar 20, 2024 pm 12:58 PM

Office workers use wps software very frequently at work. Sometimes they input multiple files a day and then send them to the leader or to a designated location. So how does wps software compress a folder and package it for sending? The editor below will teach you. This operation step. First, organize the files and folders you want to send into the same folder. If you have a lot of files, it's a good idea to name each file so it's easier to identify when sending. Second step, this time click on this large folder and then right-click. Select "Add to archive". Step 3: At this time, the software will automatically help us package our files. Select "Compress to XX.zip". This zip is the packaging format, and then click Compress Now.​

See all articles