Table of Contents
1. Reasons for migration
2. Some differences between relational databases and Nosql
(1) Differences in storage methods
(4) Thoughts on the development from mysql to nosql
(5) Why can hbase store massive amounts of data?
Home Database Mysql Tutorial Some thoughts and designs on migrating data from mysql to hbase

Some thoughts and designs on migrating data from mysql to hbase

Mar 02, 2017 pm 04:45 PM

1. Reasons for migration

Due to the development of business, using mysql to create indexes and search has caused the bottleneck of data flow to be stuck in the database io. For example, every time a full table is dumped, it will cause The pressure is too great, which takes a long time, and the current data volume has basically reached 100 million levels. If you want mysql to provide better services, you must consider sub-databases and tables in the next step; based on this In this case, consider using hbase for data storage, because the amount of data that hbase can bear is much larger than mysql, and the expansion of columns is also very convenient

2. Some differences between relational databases and Nosql

(1) Differences in storage methods

In relational databases such as mysql, sqlserver, oracle, data is stored according to rows, as shown in the following figure:


But in hbase, all data is stored based on columns, as shown below:


The logical model of hbase is as follows:


Among them: com.cnn.ww corresponds to rowkey, which is equivalent to the concept of mysql's primary key

contents, anchor: These two correspond to the concept of column family. In terms of physical storage, data of the same column family is stored in the same file

cnnsi.com, mylook.ca: corresponds to Columns under the column family can be dynamically added in hbase

The corresponding grid data represents unit data, that is, corresponding to rowkey, cf: the specific value under column

Among them, tn: represents the timestamp, different versions of unit data

One of the storage structures is as follows:



##(2) Some differences between CRUD

CRUD is the most basic and commonly used operation of the database. There are also corresponding commands in hbase. For example, the table creation statement for mysql will not be detailed here. For The hbase shell is as follows

create 'table','columnfamily'

You can create a table named table, the column family is columnfamily, and some other blocksize and version data are default

When reading data, use hbase statements such as: get 'table', 'row', 'cf:column' to get the corresponding data

When updating data, use hbase There is no concept of corresponding updates, but there will be a new version, which can be reflected from the timestamp. The statements used are

put 'table', 'row', 'cf:name', 'value '

can assign the value of value to the corresponding cf column family. The column of name

is the difference between deleting data. Deleting data in mysql can only be to delete a row directly or to change a certain column. Set it to empty, and you can directly delete a column in hbase

(3) Differences in indexes

In mysql, you can create indexes or filter queries, but in hbase, only rowkey is supported The fastest query speed

(4) Thoughts on the development from mysql to nosql

The history of relational databases has been long, but when the amount of data expands, for example, for the mysql database, when the amount of data reaches hundreds of millions or more Sometimes, if you query according to the index, the effect may not be particularly obvious. In the end, you can only query according to the primary key, or gradually develop into a sub-database and sub-table model. However, sub-database and sub-table bring a lot of trouble to operation, maintenance and use. Big trouble; so at this time, the development of primary key of nosql database, nosql abbreviated as not only sql, gradually developed and expanded as the amount of data increased dramatically. Taking hbase in nosql as an example, it supports TB and PB data, and columns The expansion is particularly flexible

(5) Why can hbase store massive amounts of data?

In fact, hbase can be regarded as the result of mysql sub-database and table sub-database, but the difference is that mysql sub-database is divided into The table supports indexes, etc., but hbase only supports rowkey as the primary key index. From the book, we can know that hbase data is stored according to columns, and when the data is too large, it will be split according to rows, as shown below :



## Put different regions on different machines, and finally there is a master for management, which is equivalent to The rows and columns are divided to store a large amount of data

3. Some problems encountered in data migration

(1) Problems with joint index

There will be problems in mysql In some joint index situations, for example, there is a table of correspondence between products and categories. We need to get all the categories of a certain product, and we also hope to get all the products of a certain category. In mysql, we can directly follow the joint index to meet the requirements, but in What should I do when hbase can only query according to rowkey?

After reading the relevant data, there are two solutions as follows

1. Build a wide table

In hbase , allowing the columns between rows to be different, as long as there is a common column family, then for the above situation, you can build a wide table classified as rowkey, as shown below

Classification id , as rowkey

product_id, as column name

value is stored as whether to delete


The above rowkey can be the classification id , you can get all product_id directly from row, and then filter whether to delete it yourself

2. Build a tall table

What is building a tall table, that is to say, you don’t need so many columns, just To store multiple rows, because hbase is sorted in dictionary order, the following design can be done

Classification id_product id, as rowkey


As long as you scan the rows starting with 1, you can get all the data

Essentially, the above two methods build a secondary index to store the data


The above are some thoughts and designs on migrating data from mysql to hbase. For more related content, please pay attention to the PHP Chinese website (www.php. cn)!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MySQL: Simple Concepts for Easy Learning MySQL: Simple Concepts for Easy Learning Apr 10, 2025 am 09:29 AM

MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

How to open phpmyadmin How to open phpmyadmin Apr 10, 2025 pm 10:51 PM

You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

MySQL: An Introduction to the World's Most Popular Database MySQL: An Introduction to the World's Most Popular Database Apr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

How to use single threaded redis How to use single threaded redis Apr 10, 2025 pm 07:12 PM

Redis uses a single threaded architecture to provide high performance, simplicity, and consistency. It utilizes I/O multiplexing, event loops, non-blocking I/O, and shared memory to improve concurrency, but with limitations of concurrency limitations, single point of failure, and unsuitable for write-intensive workloads.

Why Use MySQL? Benefits and Advantages Why Use MySQL? Benefits and Advantages Apr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

MySQL's Place: Databases and Programming MySQL's Place: Databases and Programming Apr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

MySQL and SQL: Essential Skills for Developers MySQL and SQL: Essential Skills for Developers Apr 10, 2025 am 09:30 AM

MySQL and SQL are essential skills for developers. 1.MySQL is an open source relational database management system, and SQL is the standard language used to manage and operate databases. 2.MySQL supports multiple storage engines through efficient data storage and retrieval functions, and SQL completes complex data operations through simple statements. 3. Examples of usage include basic queries and advanced queries, such as filtering and sorting by condition. 4. Common errors include syntax errors and performance issues, which can be optimized by checking SQL statements and using EXPLAIN commands. 5. Performance optimization techniques include using indexes, avoiding full table scanning, optimizing JOIN operations and improving code readability.

Monitor Redis Droplet with Redis Exporter Service Monitor Redis Droplet with Redis Exporter Service Apr 10, 2025 pm 01:36 PM

Effective monitoring of Redis databases is critical to maintaining optimal performance, identifying potential bottlenecks, and ensuring overall system reliability. Redis Exporter Service is a powerful utility designed to monitor Redis databases using Prometheus. This tutorial will guide you through the complete setup and configuration of Redis Exporter Service, ensuring you seamlessly build monitoring solutions. By studying this tutorial, you will achieve fully operational monitoring settings

See all articles