HBase的Snapshots功能介绍
hbase的snapshot功能还是挺有用的,本文翻译自cloudera的一篇博客,希望对想了解snapshot?的朋友有点作用,如果翻译得不好的地方,请查看原文 Introduction to Apache HBase Snapshots? 对照。 在之前,备份或者拷贝一个表只能用copy/export表,或者disable
hbase的snapshot功能还是挺有用的,本文翻译自cloudera的一篇博客,希望对想了解snapshot?的朋友有点作用,如果翻译得不好的地方,请查看原文 Introduction to Apache HBase Snapshots? 对照。
在之前,备份或者拷贝一个表只能用copy/export表,或者disable表后,从hdfs中拷贝出所有hfile。copy/export表用的是MapReduce来scan和copy表,这会对Region Server产生直接的性能影响,而用disable后拷贝文件则是直接不能访问了。
以此相反,HBase的snapshots功能可以让管理员不用拷贝数据的情况下轻松拷贝table,并且只会对RS造成很小影响。导出snapshots到另一个集群不会直接作用于RS,只是添加一些额外的逻辑。
下面是一些实用snapshots的场景:
- 从用户/app错误中恢复
- 从某个已知的安全状态恢复/还原。
- 查看之前的snapshots并选择性地从merge到产线中。
- 在重大升级或者修改之前保存snapshots。
- 审查和/或报告指定时间的数据视图
- 有目的性地按月采集数据。
- 运行每天/每月/一刻时间报表。
- 应用测试
- 用snapshots在产线测试schema或者程序改变对数据相似度的影响,然后丢弃它。例如,获取一个snapshot,然后用该snapshot的内容创建一个表,然后对该表进行操作。
- 离线作业
- 获取一个snapshot,导到另外一个集群并用MapReduce作业来分析它。由于导出snapshot的操作发生在HDFS级别,你不会像拷贝表那样拖慢HBase。
什么是Snapshot?
一个snapshot其实就是一组metadata信息的集合,它可以让管理员将表恢复到以前的一个状态。snapshot并不是一份拷贝,它只是一个文件名的列表,并不拷贝数据。一个全的snapshot恢复以为着你可以回滚到原来的表schema和创建snapshot之前的数据。
操作
- 获取:该操作尝试从指定的表中获取一个snapshot。该操作在regions作balancing,split或者merge等迁移工作的时候可能会失败。
- 拷贝:该操作用指定snapshot的schema和数据来创建一个新表。该操作会不会对 原表或者该snapshot造成任何影响。
- 恢复:?该操作将一个表的schema和data回滚到创建该snapshot时的状态。?
- 删除:该操作将一个snapshot从系统中移除,释放磁盘空间,不会对其他拷贝或者snapshot造成任何影响。
- 导出:该操作拷贝这个snapshot的data和metadata到另一个集群。该操作仅影响HDFS,并不会和hbase的Master或者Region Server通信(这些操作可能会导致集群挂掉)。
零拷贝snapshot,恢复,克隆
snapshot和CopyTable/ExportTable最大的区别是snapshot仅涉及metadata,不涉及数据拷贝。
Hbase一个重要的设计就是一旦写到一个文件就不会修改了。有不可修改的文件意味着一个snapshot仅需保持当前文件的使用相关信息就可以了, 并且,当compaction发生的时候,snapshot通知hbase系统仅把这些文件归档而不要删除它。
同样,当克隆或者恢复操作发生的时候,由于这些不变的文件,当用snapshot创建新表的时候仅需链接向这些不变的文件就行了。
导出snapshot是唯一需要拷贝数据的操作,这是因为其它的集群并没有这些数据文件。
导出Snapshot vs Copy/Export Table
除去更加好的一致性保证外,和Copy/Export作业相比,最大的不同是导出snapshot操作是在HDFS层级进行的。这就意味着hbase的master和Region Server是不参与该操作的,因此snapshot导出不会创建一些不必要的数据缓存,并且也不会因为由于很多scan操作导致的GC。snapshot导出操作产生的网络和磁盘开销都被HDFS的datanode分摊吸收了。
HBase Shell: Snapshot 操作
要想使用snapshot功能,请确认你的hbase-site.xml中的hbase.snapshot.enabled
配置项为true,如下:
? ??? hbase.snapshot.enabled ??? true ?
?创建一个snapshot用如下命令,该操作没有文件拷贝操作:
hbase> snapshot ‘tableName’, ‘snapshotName’
要想知道系统中创建了哪些snapshot,可以用list_snapshot命令,它会显示snapshot名,源表和创建时间日期。
?
hbase> list_snapshots SNAPSHOT TABLE + CREATION TIME TestSnapshot TestTable (Mon Feb 25 21:13:49 +0000 2013)
要想移除snapshot,用delete_snapshot命令,移除snapshot不会对已经克隆好的表胡总和随后发生的snapshot造成任何影响。
hbase> delete_snapshot ‘snapshotName’
?要想使用snapshot来创建一个新表,用clone_snapshot命令。该操作也无任何数据拷贝操作发生。
hbase> clone_snapshot ‘snapshotName’, ‘newTableName’
要是想恢复或者替换当前表的schema和数据,用restore_snapshot命令。
hbase> restore_snapshot ‘snapshotName’
要想导出一个snapshot到另外的集群,用ExportSnapshot工具。导出操作不会对Region server造成额外的负担。因为它工作在HDFS层级,你仅需指定HDFS的位置(其它集群的hbase.rootdir)即可,如下。
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot SnapshotName -copy-to hdfs:///srv2:8082/hbase
当前存在的限制
Snapshots依赖于一些想当然的地方,当前还有很多新特性并没有完全集成到工具里:
- 做snapshot或者克隆表时如果发生Merging region操作时数据可能丢失。
- 恢复表的时候,由于是对一个replication进行的,这可能导致两个集群数据不同步。
总结
当前的snapshot特性以及包括了所有基本功能,但是依然还有很多工作要做,例如质量(metrics),Web UI集成,磁盘使用优化等。
要想了解更多snapshot相关信息,请看官方文档的snapshot一节。
非特别说明,均为原创文章,转载请注明: 转载自邓的博客
本文链接地址: HBase的Snapshots功能介绍

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



There will be many AI creation functions in the Doubao app, so what functions does the Doubao app have? Users can use this software to create paintings, chat with AI, generate articles for users, help everyone search for songs, etc. This function introduction of the Doubao app can tell you the specific operation method. The specific content is below, so take a look! What functions does the Doubao app have? Answer: You can draw, chat, write articles, and find songs. Function introduction: 1. Question query: You can use AI to find answers to questions faster, and you can ask any kind of questions. 2. Picture generation: AI can be used to create different pictures for everyone. You only need to tell everyone the general requirements. 3. AI chat: can create an AI that can chat for users,

Both vivox100s and x100 mobile phones are representative models in vivo's mobile phone product line. They respectively represent vivo's high-end technology level in different time periods. Therefore, the two mobile phones have certain differences in design, performance and functions. This article will conduct a detailed comparison between these two mobile phones in terms of performance comparison and function analysis to help consumers better choose the mobile phone that suits them. First, let’s look at the performance comparison between vivox100s and x100. vivox100s is equipped with the latest

With the rapid development of the Internet, the concept of self-media has become deeply rooted in people's hearts. So, what exactly is self-media? What are its main features and functions? Next, we will explore these issues one by one. 1. What exactly is self-media? We-media, as the name suggests, means you are the media. It refers to an information carrier through which individuals or teams can independently create, edit, publish and disseminate content through the Internet platform. Different from traditional media, such as newspapers, television, radio, etc., self-media is more interactive and personalized, allowing everyone to become a producer and disseminator of information. 2. What are the main features and functions of self-media? 1. Low threshold: The rise of self-media has lowered the threshold for entering the media industry. Cumbersome equipment and professional teams are no longer needed.

As Xiaohongshu becomes popular among young people, more and more people are beginning to use this platform to share various aspects of their experiences and life insights. How to effectively manage multiple Xiaohongshu accounts has become a key issue. In this article, we will discuss some of the features of Xiaohongshu account management software and explore how to better manage your Xiaohongshu account. As social media grows, many people find themselves needing to manage multiple social accounts. This is also a challenge for Xiaohongshu users. Some Xiaohongshu account management software can help users manage multiple accounts more easily, including automatic content publishing, scheduled publishing, data analysis and other functions. Through these tools, users can manage their accounts more efficiently and increase their account exposure and attention. In addition, Xiaohongshu account management software has

PHP Tips: Quickly implement the function of returning to the previous page. In web development, we often encounter the need to implement the function of returning to the previous page. Such operations can improve the user experience and make it easier for users to navigate between web pages. In PHP, we can achieve this function through some simple code. This article will introduce how to quickly implement the function of returning to the previous page and provide specific PHP code examples. In PHP, we can use $_SERVER['HTTP_REFERER'] to get the URL of the previous page

"Exploring Discuz: Definition, Functions and Code Examples" With the rapid development of the Internet, community forums have become an important platform for people to obtain information and exchange opinions. Among the many community forum systems, Discuz, as a well-known open source forum software in China, is favored by the majority of website developers and administrators. So, what is Discuz? What functions does it have, and how can it help our website? This article will introduce Discuz in detail and attach specific code examples to help readers learn more about it.

Detailed explanation of the functions and functions of GDM under Linux In the Linux operating system, GDM (GNOMEDisplayManager) is a graphical login manager that provides an interface for users to log in and log out of the system. GDM is usually part of the GNOME desktop environment, but can be used by other desktop environments as well. The role of GDM is not only to provide a login interface, but also includes user session management, screen saver, automatic login and other functions. The functions of GDM mainly include the following aspects:

PHP is a server-side scripting language widely used in web development. Its main function is to generate dynamic web content. When combined with HTML, it can create rich and colorful web pages. PHP is powerful. It can perform various database operations, file operations, form processing and other tasks, providing powerful interactivity and functionality for websites. In the following articles, we will further explore the role and functions of PHP, with detailed code examples. First, let’s take a look at a common use of PHP: dynamic web page generation: P
