Let's talk in depth about why mysql index uses B+ tree structure-Mysql Tutorial-php.cn

This article is an advanced study of mysql. It introduces the reason why mysql uses B-tree as the index data structure. I hope it will be helpful to everyone!

Let's talk in depth about why mysql index uses B+ tree structure

Index improves query efficiency, just like the book we read, if we want to turn directly to a certain chapter, we don’t need to turn page by page, just read it Table of contents, just find the page number where it is located according to the table of contents. [Related recommendations: mysql video tutorial]

In the computer we need a data structure to store this directory. Common data structures include hash tables, binary search trees, and binary balances. Tree (AVL), red-black tree, then why Innodb and MyISAM choose b-tree.

1. Hash table

The hash table is an array linked list, with subscripts 0, 1, 2, 3..... indicating the location of its data. If you want to store data in a hash table, first use a hash algorithm on the data (the basic one is the modulo operation). If the array length is 13, after modulo 13, it will be 0-12, which corresponds to the lower part of the data. If the calculated subscripts are the same, the linked list will follow the subscript position.

Disadvantages:

Using hash storage requires adding all data files to the memory, which consumes more memory space.
Hash search is an equivalent query, which is very fast, but there is no range rule between each data. However, in actual work, more range queries are used, and hash is not suitable.

It cannot be directly said that mysql does not use hash tables, but it must be determined based on the storage engine. The Memory storage engine uses hash tables

2. Binary search tree

Lets talk in depth about why mysql index uses B+ tree structure

Disadvantages:

As shown in the figure, in extreme cases, there may be skew problems, and finally it becomes a linked list structure.
Causes the tree node to be too deep, thereby increasing the IO for search, and now IO is the bottleneck of search

3. Binary balanced tree-AVL

In order to maintain the balance of the tree and avoid data skew, a rotation operation is required. Through left or right rotation, the length of the longest subtree and the shortest subtree cannot exceed 1. If it exceeds 1, it is not an AVL tree in the strict sense.

Lets talk in depth about why mysql index uses B+ tree structure

Disadvantages:

1. When the amount of data is large, in order to maintain balance, 1-n rotations are required. This rotation is a comparison It is a waste of performance, insertion and deletion efficiency is extremely low, and query efficiency is very high.

There are only two branches, and the depth of the tree is still very deep when the amount of data is large.

4. Red-black tree

The longest subtree cannot exceed 2 times the shortest subtree. Through color change and rotation, the insertion and query are balanced

The red-black tree is a variant of the avl tree, which loses part of the query performance to improve the insertion performance.

Lets talk in depth about why mysql index uses B+ tree structure

Disadvantages:

There are also only two branches, and the depth will still be very deep when the amount of data is large

The above three binary trees, as the data increases, will eventually have too many nodes, and they have only 2 branches, so the number of IOs is also a lot.

How to solve it There are only 2 branches and the depth is too deep, so there is a B-tree, add branches

5. B-Tree

First do not read the B-subtract tree, read B-tree

All key values are distributed throughout the tree.

The search may end at a non-leaf node, and a search is performed within the complete set of keywords, and the performance is close to binary search.

Each node has at most m subtrees.

The root node has at least 2 subtrees.

The branch node has at least m/2 subtrees (all branch nodes except the root node and leaf nodes).

All leaf nodes are on the same layer, each node can have up to m-1 keys, and are arranged in ascending order

Lets talk in depth about why mysql index uses B+ tree structure

As shown in the picture above: (Only a part of the picture is drawn. In fact, there are no restrictions, not just p1, p2, and p3)

Each node occupies one disk block. There are two keywords arranged in ascending order on a node and three pointers to the root node of the subtree. The pointers store the disk block address where the child node is located. The three range fields divided by the two keywords correspond to the range fields of the data of the subtree pointed to by the three pointers. Taking the root node as an example, the keywords are 16 and 34. The data range of the subtree pointed by the p1 pointer is less than 16, the data range of the subtree pointed by the p2 pointer is 16-34, and the data range of the subtree pointed by the p3 pointer is greater than 34. .

The process of finding keyword 28:

Find disk block 1 based on the root node and read it into the memory. [First disk I/O operation]
Compare keyword 28 in the interval (16, 34) and find the pointer p2 of disk block 1.
Find disk block 3 based on the p2 pointer and read it into the memory. [Second disk I/O operation]
Compare keyword 28 in the interval (25, 31) and find pointer p2 of disk block 3.
Find disk block 8 based on pointer p2 and read it into memory. [The third disk I/O operation]
Find keyword 28 in the keyword list in disk block 8, end.

Disadvantages:

Each node has a key and contains data, and the storage space of each page is limited. If the data is large, it will cause each The number of keys that a node can store becomes smaller.
When the amount of stored data is large, the depth will become larger and the number of IO times to query the disk will increase, thereby affecting query performance.

6. B-tree

B-tree is an optimization based on B-tree. The changes are as follows:

B Each node of the tree can contain more nodes. There are two reasons for this. The first reason is to reduce the height of the tree. The second reason is to turn the data range into multiple intervals. The more intervals there are, the easier it is to retrieve data. faster.

Non-leaf nodes only store keys, and leaf nodes store keys and data.

The pointers of leaf nodes are connected to each other (in line with the characteristics of disk read-ahead), and the sequential query performance is higher.

Lets talk in depth about why mysql index uses B+ tree structure

As shown above: There are two head pointers on the B-tree, one points to the root node, and the other points to the minimum leaf node of the keyword, and there is a chain ring structure between all leaf nodes (and data nodes), so two B-trees can be There are three search operations: one is a range search and paging search for the primary key, and the other is a random search starting from the root node.

Differences in indexes between InnoDB and MyISAM

1. InnoDB-primary key index

Leaf nodes store specific row data

Lets talk in depth about why mysql index uses B+ tree structure

2. InnoDB-Non-primary key index

The leaf nodes of the non-primary key index store the primary key value (so the query data basically needs to be returned to the table)

Lets talk in depth about why mysql index uses B+ tree structure

3. MyISAM

The leaf nodes store the address of the row data, which requires an additional addressing and one more IO

Lets talk in depth about why mysql index uses B+ tree structure

Summary: Why mysql uses B-tree

Accurate statement: Why the indexes of mysql's InnoDB and MyISAM storage engines use B-tree

Hash table, equivalent query is very fast, but it does not meet the common range search and there is no relationship between two adjacent values, and hash consumes more memory.
Binary trees/balanced binary trees/red-black trees, etc. all have and only have 2 branches. The commonality is that the depth of the tree becomes deeper when the amount of data is large. Increase the number of IOs.
B-tree will store data on nodes, so the number of keys stored on one page will be reduced and the depth of the tree will be increased.
B The non-leaf nodes in the tree have data removed, which will increase the number of keys on a page, and the leaf nodes are connected through linked lists. Convenient for range search and paging.

Original address: https://juejin.cn/post/6994810803643744269

Author: Mr. Ji

For more programming-related knowledge, please visit: Programming Video! !

The above is the detailed content of Let's talk in depth about why mysql index uses B+ tree structure. For more information, please follow other related articles on the PHP Chinese website!