HFile文件格式与HBase读写
HFile是HBase存储数据的文件组织形式。HFile文件的特点: 1)HFile由DataBlock、Meta信息(Index、BloomFilter)、Info等信息组成。 2)整个DataBlock由一个或者多个KeyValue组成。 3)在文件内按照Key排序。 HFile V1的数据组织格式: DataBlock区域、MetaBlo
HFile是HBase存储数据的文件组织形式。HFile文件的特点:
1)HFile由DataBlock、Meta信息(Index、BloomFilter)、Info等信息组成。
2)整个DataBlock由一个或者多个KeyValue组成。
3)在文件内按照Key排序。
HFile V1的数据组织格式:
DataBlock区域、MetaBlock(bloomfilter) 与FileInfo、DataBlockIndex、MetaBlockIndex、Trailer分离。
打开一个HFile文件需要加载FileInfo、DataBlockIndex、MetablockIndex以及Fixed File Trailer到内存。
如下图所示:
HFile V1的数据格式在0.92版本升级到V2版本,
HFile V2的数据组织格式如下图所示:
与V1版本的相比,它的区别在于
1)文件分为三部分:Scanned block section,Non-scanned block section,以及Opening-time data section
2) 为DataBlockIndex建立多层索引。DataBlockIndex分为Leaf Index Block、Root Data Index(或者multi Root Data index(紫色的Meta Index区域)),Leaf index block具体存储了DataBlock的offset、length、以及firstkey的信息。RootDataIndex 存储的是每个Leaf index block的offset、length、Leaf index Block记录的第一个key,以及截至到该Leaf Index Block记录的DataBlock的个数。假定DataBlock的个数足够多,HFile文件又足够大的情况下,默认的128KB的长度的ROOTDataIndex仍然存在超过chunk大小的情况时,会分成更多的层次。这样最终的可能是ROOT INDEX –> IntermediateLevel ROOT INDEX(可以是多层) —〉Leaf index block
在ROOT INDEX中会记录Mid Key所对应的信息,帮助在做File Split或者折半查询时快速定位中间Row的信息。
//追加Split操作的相关知识:Region在执行Split操作,默认选择Region当中最大Store下的最大Storefile文件中的midkey,而midkey其实只是在通过HFile获取了这个文件之前记录好的数据。在自动触发Split操作的前提下,大部分的Split操作都伴随在Compaction操作之后进行的原因,在于可以对于Region中的文件进行合并,生成较大的StoreFile文件,以方便选择更好的Split Point。
HFile V2的写操作流程:
1)Append KV到 Data Block。在每次Append之前,首先检查当前DataBlock的大小是否超过了默认的设置,如果不超出阈值,写入输出流。如果超出了阈值,则执行finishBlock(),按照Table-CF的设置,对DataBlock进行编码和压缩,然后写入HFile中。//以Block为单位进行编码和压缩,会有一些性能开销,可以参考HBase实战系列1—压缩与编码技术
2)根据数据的规模,写入Leaf index block和Bloom block。
Leaf index Block,每次Flush一个DataBlock会在该Block上添加一条记录,并判断该Block的大小是否超过阈值(默认128KB),超出阈值的情况下,会在DataBlock之后写入一个Leaf index block。对应的控制类:HFileBlockIndex,内置了BlockIndexChunk、BlockIndexReader和BlockIndexWriter(实现了InlineBlockWriter接口)。
Bloom Block设置:默认使用MURMUR hash策略,每个Block的默认大小为128KB,每个BloomBlock可以接收的Key的个数通过如下的公式计算,接收的key的个数 与block的容量以及errorRate的之间存在一定的关系,如下的计算公式中,可以得到在系统默认的情况下,每个BloomBlock可以接纳109396个Key。
注意:影响BloomBlock个数的因素,显然受到HFile内KeyValue个数、errorRate、以及BlockSize大小的影响。可以根据应用的需求合理调整相关控制参数。
<span style="color: #008000; font-style: italic; font-weight: bold;">/** * The maximum number of keys we can put into a Bloom filter of a certain * size to maintain the given error rate, assuming the number of hash * functions is chosen optimally and does not even have to be an integer * (hence the "ideal" in the function name). * * @param bitSize * @param errorRate * @return maximum number of keys that can be inserted into the Bloom filter * @see #computeMaxKeys(long, double, int) for a more precise estimate */</span> <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">long</span> idealMaxKeys<span style="color: #009900;">(</span><span style="color: #000066; font-weight: bold;">long</span> bitSize, <span style="color: #000066; font-weight: bold;">double</span> errorRate<span style="color: #009900;">)</span> <span style="color: #009900;">{</span> <span style="color: #666666; font-style: italic;">// The reason we need to use floor here is that otherwise we might put</span> <span style="color: #666666; font-style: italic;">// more keys in a Bloom filter than is allowed by the target error rate.</span> <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #009900;">(</span><span style="color: #000066; font-weight: bold;">long</span><span style="color: #009900;">)</span> <span style="color: #009900;">(</span>bitSize <span style="color: #339933;">*</span> <span style="color: #009900;">(</span>LOG2_SQUARED <span style="color: #339933;">/</span> <span style="color: #339933;">-</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">log</span><span style="color: #009900;">(</span>errorRate<span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//这里的bitSize是byteSizeHint *8,如果按照默认设置,大概是128*1024*8 *(Math.log(2)*Math.log(2)/-Math.log(0.01)) = 109396 .</span> <span style="color: #009900;">}</span>
每一个BloomBlock会对应index信息,存储在Meta Index区域。
这样在加载数据的时候,只需加载不超过128KB的RootDataIndex以及IntermediateLevelRootIndex,而避免加载如HFile V1的所有的Leaf index block信息,同样,也只需要加载BloomBlockIndex信息到内存,这样避免在HFile V1格式因为加载过大的DataBlockIndex造成的开销,加快Region的加载速度。
From Binospace, post HFile文件格式与HBase读写
文章的脚注信息由WordPress的wp-posturl插件自动生成
Copyright © 2008
This feed is for personal, non-commercial use only.
The use of this feed on other websites breaches copyright. If this content is not in your news reader, it makes the page you are viewing an infringement of the copyright. (Digital Fingerprint:
)

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Open WeChat, select Settings in Me, select General and then select Storage Space, select Management in Storage Space, select the conversation in which you want to restore files and select the exclamation mark icon. Tutorial Applicable Model: iPhone13 System: iOS15.3 Version: WeChat 8.0.24 Analysis 1 First open WeChat and click the Settings option on the My page. 2 Then find and click General Options on the settings page. 3Then click Storage Space on the general page. 4 Next, click Manage on the storage space page. 5Finally, select the conversation in which you want to recover files and click the exclamation mark icon on the right. Supplement: WeChat files generally expire in a few days. If the file received by WeChat has not been clicked, the WeChat system will clear it after 72 hours. If the WeChat file has been viewed,

In Windows, the Photos app is a convenient way to view and manage photos and videos. Through this application, users can easily access their multimedia files without installing additional software. However, sometimes users may encounter some problems, such as encountering a "This file cannot be opened because the format is not supported" error message when using the Photos app, or file corruption when trying to open photos or videos. This situation can be confusing and inconvenient for users, requiring some investigation and fixes to resolve the issues. Users see the following error when they try to open photos or videos on the Photos app. Sorry, Photos cannot open this file because the format is not currently supported, or the file

Tmp format files are a temporary file format usually generated by a computer system or program during execution. The purpose of these files is to store temporary data to help the program run properly or improve performance. Once the program execution is completed or the computer is restarted, these tmp files are often no longer necessary. Therefore, for Tmp format files, they are essentially deletable. Moreover, deleting these tmp files can free up hard disk space and ensure the normal operation of the computer. However, before deleting Tmp format files, we need to

When deleting or decompressing a folder on your computer, sometimes a prompt dialog box "Error 0x80004005: Unspecified Error" will pop up. How should you solve this situation? There are actually many reasons why the error code 0x80004005 is prompted, but most of them are caused by viruses. We can re-register the dll to solve the problem. Below, the editor will explain to you the experience of handling the 0x80004005 error code. Some users are prompted with error code 0X80004005 when using their computers. The 0x80004005 error is mainly caused by the computer not correctly registering certain dynamic link library files, or by a firewall that does not allow HTTPS connections between the computer and the Internet. So how about

The gho file is a GhostImage image file, which is usually used to back up the entire hard disk or partition data into a file. In some specific cases, we need to reinstall this gho file back to the hard drive to restore the hard drive or partition to its previous state. The following will introduce how to install the gho file. First, before installation, we need to prepare the following tools and materials: Entity gho file: Make sure you have a complete gho file, which usually has a .gho suffix and contains a backup

Quark Netdisk and Baidu Netdisk are currently the most commonly used Netdisk software for storing files. If you want to save the files in Quark Netdisk to Baidu Netdisk, how do you do it? In this issue, the editor has compiled the tutorial steps for transferring files from Quark Network Disk computer to Baidu Network Disk. Let’s take a look at how to operate it. How to save Quark network disk files to Baidu network disk? To transfer files from Quark Network Disk to Baidu Network Disk, you first need to download the required files from Quark Network Disk, then select the target folder in the Baidu Network Disk client and open it. Then, drag and drop the files downloaded from Quark Cloud Disk into the folder opened by the Baidu Cloud Disk client, or use the upload function to add the files to Baidu Cloud Disk. Make sure to check whether the file was successfully transferred in Baidu Cloud Disk after the upload is completed. That's it

A file path is a string used by the operating system to identify and locate a file or folder. In file paths, there are two common symbols separating paths, namely forward slash (/) and backslash (). These two symbols have different uses and meanings in different operating systems. The forward slash (/) is a commonly used path separator in Unix and Linux systems. On these systems, file paths start from the root directory (/) and are separated by forward slashes between each directory. For example, the path /home/user/Docume

QQ email: QQ number@qq.com, English QQ email: English or numbers@qq.com, foxmail email account: set up your own account@foxmail.com, mobile phone email account: mobile phone number@qq.com. Tutorial Applicable Model: iPhone13 System: IOS15.3 Version: QQ Mailbox 6.3.3 Analysis 1QQ mailbox has four formats, commonly used QQ mailbox: QQ number@qq.com, English QQ mailbox: English or numbers@qq.com, foxmail Email account: set up your own account@foxmail.com, mobile phone email account: mobile phone number@qq.com. Supplement: What is qq mailbox? 1 The earliest QQ mailbox was only between QQ users
