This article is an advanced study of mysql. It introduces the reason why mysql uses B-tree as the index data structure. I hope it will be helpful to everyone!
Index improves query efficiency, just like the book we read, if we want to turn directly to a certain chapter, we don’t need to turn page by page, just read it Table of contents, just find the page number where it is located according to the table of contents. [Related recommendations: mysql video tutorial]
In the computer we need a data structure to store this directory. Common data structures include hash tables, binary search trees, and binary balances. Tree (AVL), red-black tree, then why Innodb and MyISAM choose b-tree.
The hash table is an array linked list, with subscripts 0, 1, 2, 3..... indicating the location of its data. If you want to store data in a hash table, first use a hash algorithm on the data (the basic one is the modulo operation). If the array length is 13, after modulo 13, it will be 0-12, which corresponds to the lower part of the data. If the calculated subscripts are the same, the linked list will follow the subscript position.
Disadvantages:
It cannot be directly said that mysql does not use hash tables, but it must be determined based on the storage engine. The Memory storage engine uses hash tables
Disadvantages:
In order to maintain the balance of the tree and avoid data skew, a rotation operation is required. Through left or right rotation, the length of the longest subtree and the shortest subtree cannot exceed 1. If it exceeds 1, it is not an AVL tree in the strict sense.
Disadvantages:
1. When the amount of data is large, in order to maintain balance, 1-n rotations are required. This rotation is a comparison It is a waste of performance, insertion and deletion efficiency is extremely low, and query efficiency is very high.
The longest subtree cannot exceed 2 times the shortest subtree. Through color change and rotation, the insertion and query are balanced
The red-black tree is a variant of the avl tree, which loses part of the query performance to improve the insertion performance.
Disadvantages:
There are also only two branches, and the depth will still be very deep when the amount of data is large
The above three binary trees, as the data increases, will eventually have too many nodes, and they have only 2 branches, so the number of IOs is also a lot.
How to solve it There are only 2 branches and the depth is too deep, so there is a B-tree, add branches
- First do not read the B-subtract tree, read B-tree
- All key values are distributed throughout the tree.
- The search may end at a non-leaf node, and a search is performed within the complete set of keywords, and the performance is close to binary search.
- Each node has at most m subtrees.
- The root node has at least 2 subtrees.
- The branch node has at least m/2 subtrees (all branch nodes except the root node and leaf nodes).
- All leaf nodes are on the same layer, each node can have up to m-1 keys, and are arranged in ascending order
As shown in the picture above: (Only a part of the picture is drawn. In fact, there are no restrictions, not just p1, p2, and p3)
Each node occupies one disk block. There are two keywords arranged in ascending order on a node and three pointers to the root node of the subtree. The pointers store the disk block address where the child node is located. The three range fields divided by the two keywords correspond to the range fields of the data of the subtree pointed to by the three pointers. Taking the root node as an example, the keywords are 16 and 34. The data range of the subtree pointed by the p1 pointer is less than 16, the data range of the subtree pointed by the p2 pointer is 16-34, and the data range of the subtree pointed by the p3 pointer is greater than 34. .
The process of finding keyword 28:
Disadvantages:
B-tree is an optimization based on B-tree. The changes are as follows:
- B Each node of the tree can contain more nodes. There are two reasons for this. The first reason is to reduce the height of the tree. The second reason is to turn the data range into multiple intervals. The more intervals there are, the easier it is to retrieve data. faster.
- Non-leaf nodes only store keys, and leaf nodes store keys and data.
- The pointers of leaf nodes are connected to each other (in line with the characteristics of disk read-ahead), and the sequential query performance is higher.
As shown above: There are two head pointers on the B-tree, one points to the root node, and the other points to the minimum leaf node of the keyword, and there is a chain ring structure between all leaf nodes (and data nodes), so two B-trees can be There are three search operations: one is a range search and paging search for the primary key, and the other is a random search starting from the root node.
Leaf nodes store specific row data
The leaf nodes of the non-primary key index store the primary key value (so the query data basically needs to be returned to the table)
The leaf nodes store the address of the row data, which requires an additional addressing and one more IO
Accurate statement: Why the indexes of mysql's InnoDB and MyISAM storage engines use B-tree
Hash table, equivalent query is very fast, but it does not meet the common range search and there is no relationship between two adjacent values, and hash consumes more memory.
Binary trees/balanced binary trees/red-black trees, etc. all have and only have 2 branches. The commonality is that the depth of the tree becomes deeper when the amount of data is large. Increase the number of IOs.
B-tree will store data on nodes, so the number of keys stored on one page will be reduced and the depth of the tree will be increased.
B The non-leaf nodes in the tree have data removed, which will increase the number of keys on a page, and the leaf nodes are connected through linked lists. Convenient for range search and paging.
Original address: https://juejin.cn/post/6994810803643744269
Author: Mr. Ji
For more programming-related knowledge, please visit: Programming Video! !
The above is the detailed content of Let's talk in depth about why mysql index uses B+ tree structure. For more information, please follow other related articles on the PHP Chinese website!