ACID in HBase-mysql チュートリアル-php.cn

ホームページ

データベース

mysql チュートリアル

ACID in HBase

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:28 PM

acid hbase

By Lars Hofhansl As we know, ACID stands for Atomicity, Consistency, Isolation, and Durability. HBase supports ACID in limited ways, namely Puts to the same row provide all ACID guarantees. (HBASE-3584 adds multi op transactions and HBASE-

By Lars Hofhansl

As we know, ACID stands for Atomicity, Consistency, Isolation, and Durability.

HBase supports ACID in limited ways, namely Puts to the same row provide all ACID guarantees. (HBASE-3584 adds multi op transactions and HBASE-5229 adds multi row transactions, but the principle remains the same)

So how does ACID work in HBase?

HBase employs a kind of MVCC. And HBase has no mixed read/write transactions.

The nomenclature in HBase is bit strange for historical reasons. In a nutshell each RegionServer maintains what I will call "strictly monotonically increasing transaction numbers".

When a write transaction (a set of puts or deletes) starts it retrieves the next highest transaction number. In HBase this is called a WriteNumber.
When a read transaction (a Scan or Get) starts it retrieves the transaction number of the last committed transaction. HBase calls this the ReadPoint.

Each created KeyValue is tagged with its transaction's WriteNumber (this tag, for historical reasons, is called the memstore timestamp in HBase. Note that this is separate from the application-visible timestamp.)

The highlevel flow of a write transaction in HBase looks like this:

lock the row(s), to guard against concurrent writes to the same row(s)
retrieve the current writenumber
apply changes to the WAL (Write Ahead Log)
apply the changes to the Memstore (using the acquired writenumber to tag the KeyValues)
commit the transaction, i.e. attempt to roll the Readpoint forward to the acquired Writenumber.
unlock the row(s)

The highlevel flow of a read transaction looks like this:

open the scanner
get the current readpoint
filter all scanned KeyValues with memstore timestamp > the readpoint
close the scanner (this is initiated by the client)

In reality it is a bit more complicated, but this is enough to illustrate the point. Note that a reader acquires no locks at all, but we still get all of ACID.

It is important to realize that this only works if transactions are committed strictly serially; otherwise an earlier uncommitted transaction could become visible when one that started later commits first. In HBase transaction are typically short, so this is not a problem.

HBase does exactly that: All transactions are committed serially.

Committing a transaction in HBase means settting the current ReadPoint to the transaction's WriteNumber, and hence make its changes visible to all new Scans.
HBase keeps a list of all unfinished transactions. A transaction's commit is delayed until all prior transactions committed. Note that HBase can still make all changes immediately and concurrently, only the commits are serial.

Since HBase does not guarantee any consistency between regions (and each region is hosted at exactly one RegionServer) all MVCC data structures only need to be kept in memory on every region server.

The next interesting question is what happens during compactions.

In HBase compactions are used to join multiple small store files (create by flushes of the MemStore to disk) into a larger ones and also to remove "garbage" in the process.
Garbage here are KeyValues that either expired due to a column family's TTL or VERSION settings or were marked for deletion. See here and here for more details.

Now imagine a compaction happening while a scanner is still scanning through the KeyValues. It would now be possible see a partial row (see here for how HBase defines a "row") - a row comprised of versions of KeyValues that do not reflect the outcome of any serializable transaction schedule.

The solution in HBase is to keep track of the earliest readpoint used by any open scanner and never collect any KeyValues with a memstore timestamp larger than that readpoint. That logic was - among other enhancements - added with HBASE-2856, which allowed HBase to support ACID guarantees even with concurrent flushes.
HBASE-5569 finally enables the same logic for the delete markers (and hence deleted KeyValues).

Lastly, note that a KeyValue's memstore timestamp can be cleared (set to 0) when it is older than the oldest scanner. I.e. it is known to be visible to every scanner, since all earlier scanner are finished.

Update Thursday, March 22:
A couple of extra points:

The readpoint is rolled forward even if the transaction failed in order to not stall later transactions that waiting to be committed (since this is all in the same process, that just mean the roll forward happens in a Java finally block).
When updates are written to the WAL a single record is created for the all changes. There is no separate commit record.
When a RegionServer crashes, all in flight transaction are eventually replayed on another RegionServer if the WAL record was written completely or discarded otherwise.

原文地址：ACID in HBase, 感谢原作者分享。

このウェブサイトの声明

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

ホットAIツール

Undresser.AI Undress

リアルなヌード写真を作成する AI 搭載アプリ

AI Clothes Remover

写真から衣服を削除するオンライン AI ツール。

Undress AI Tool

脱衣画像を無料で

Clothoff.io

AI衣類リムーバー

AI Hentai Generator

AIヘンタイを無料で生成します。

ホットツール

メモ帳++7.3.1

使いやすく無料のコードエディター

SublimeText3 中国語版

中国語版、とても使いやすい

ゼンドスタジオ 13.0.1

強力な PHP 統合開発環境

ドリームウィーバー CS6

ビジュアル Web 開発ツール

SublimeText3 Mac版

神レベルのコード編集ソフト（SublimeText3）

ホットトピック

Gmailメールのログイン入り口はどこですか？

7554

CakePHP チュートリアル

1382

Steamのアカウント名の形式は何ですか

Win11 Activation Key Permanent

NYTの接続はヒントと回答です

Related knowledge

Beego で Hadoop と HBase を使用してビッグデータストレージとクエリを実行する Jun 22, 2023 am 10:21 AM

ビッグデータ時代の到来に伴い、データの処理と保存の重要性がますます高まっており、大量のデータをいかに効率的に管理、分析するかが企業にとっての課題となっています。 Apache Foundation の 2 つのプロジェクトである Hadoop と HBase は、ビッグデータのストレージと分析のためのソリューションを提供します。この記事では、ビッグデータのストレージとクエリのために Beego で Hadoop と HBase を使用する方法を紹介します。 1. Hadoop と HBase の概要 Hadoop は、オープンソースの分散ストレージおよびコンピューティングシステムです。

PHP プログラミングにおける Data Rigor by Design (ACID) Jun 22, 2023 am 09:04 AM

PHP プログラミングにおけるデータ厳密設計 (ACID) PHP プログラミングでは、データ厳密設計は非常に重要な側面です。信頼できるアプリケーションは、データを正しく処理するだけでなく、データのセキュリティと一貫性を確保する必要もあります。このため、開発者はシステムの安定性と信頼性を確保するために、データ設計に ACID を使用する必要があります。 ACID は、原子性、一貫性、分離性、耐久性を指します。

SpringBoot に hbase を統合する方法 May 30, 2023 pm 04:31 PM

依存関係: org.springframework.dataspring-data-hadoop-hbase2.5.0.RELEASEorg.apache.hbasehbase-client1.1.2org.springframework.dataspring-data-hadoop2.5.0.RELEASE 構成を追加する正式な方法は、xml を使用することです。 simple 書き換えると以下のようになります。 @ConfigurationpublicclassHBaseConfiguration{@Value("${hbase.zooke

Java を使用して HBase に基づく NoSQL データベースアプリケーションを開発する方法 Sep 20, 2023 am 08:39 AM

Java を使用して HBase に基づいた NoSQL データベースアプリケーションを開発する方法はじめに: ビッグデータ時代の到来により、NoSQL データベースは大量のデータを処理するための重要なツールの 1 つになりました。 HBase は、オープンソースの分散型 NoSQL データベースシステムとして、ビッグデータの分野で広範なアプリケーションを備えています。この記事では、Java を使用して HBase に基づく NoSQL データベースアプリケーションを開発する方法を紹介し、具体的なコード例を示します。 1. HBase の概要: HBase は、Hadoop に基づく分散システムです。

Go 言語で HBase を使用して効率的な NoSQL データベースアプリケーションを実装する Jun 15, 2023 pm 08:56 PM

ビッグデータ時代の到来により、大量のデータの保存と処理が特に重要になっています。 NoSQL データベースに関しては、HBase が現在広く使用されているソリューションです。 Go 言語は、静的に強く型付けされたプログラミング言語であり、そのシンプルな構文と優れたパフォーマンスにより、クラウドコンピューティング、Web サイト開発、データサイエンスなどの分野で使用されることが増えています。この記事では、Go 言語で HBase を使用して効率的な NoSQL データベースアプリケーションを実装する方法を紹介します。 HBase の概要 HBase は、拡張性が高く、信頼性が高く、基本的な

PHP と Apache HBase を統合して NoSQL データベースと分散ストレージを実装 Jun 25, 2023 pm 06:01 PM

インターネットアプリケーションとデータ量の継続的な増加に伴い、従来のリレーショナルデータベースでは、大量のデータの保存と処理のニーズを満たすことができなくなりました。新しいタイプのデータベース管理システムとして、NoSQL (NotOnlySQL) は大規模なデータの保存と処理において大きな利点があり、ますます注目され、応用されています。 NoSQL データベースの中でも、ApacheHBase は非常に人気のあるオープンソースの分散データベースであり、Google の BigTable のアイデアに基づいて設計されており、

Beego でのデータストレージとクエリに HBase を使用する Jun 22, 2023 am 11:58 AM

Beego フレームワークでのデータストレージとクエリに HBase を使用するインターネット時代の継続的な発展に伴い、データストレージとクエリはますます重要になってきています。ビッグデータ時代の到来により、さまざまなデータソースがそれぞれの分野で重要な位置を占めていますが、非リレーショナルデータベースはデータストレージとクエリに明らかな利点を備えたデータベースであり、HBaseはHadoopをベースとした分散型非リレーショナルデータベースです。リレーショナルデータベース。この記事では、Beego フレームワークでのデータストレージとクエリに HBase を使用する方法を紹介します。 1.H

Workerman でのデータストレージとクエリに HBase を使用する方法 Nov 07, 2023 am 08:30 AM

Workerman は、多数の同時接続をホストできる高性能 PHPsocket フレームワークです。従来の PHP フレームワークとは異なり、Workerman は Apache や Nginx などの Web サーバーに依存せず、PHP プロセスを開始することでアプリケーション全体を単独で実行します。 Workerman は非常に高い作業効率と優れた耐荷重性を備えています。同時に、HBase はビッグデータで広く使用されている分散型 NoSQL データベースシステムです。

See all articles