MySQL database supports a variety of indexes, such as B-tree indexes, hash indexes, full-text indexes, etc. This article focuses on B-tree indexes. (Recommended: "mysql Tutorial")
Index principle & essence
MySQL official explanation: Index is data that improves the efficiency of data acquisition for MySQL. structure for fast querying of data. Indexes are data structures that satisfy a specific search algorithm, and these data structures point to data in a certain way to achieve efficient data search.
B tree
MySQL generally uses B tree as its index structure, so what are the characteristics of B tree?
If the tree degree is n, the upper limit of each node pointer is 2n 1
Non-leaf nodes do not store data, only pointer indexes; leaf nodes store all data, but do not store pointers
Based on the classic B-tree, a sequential access pointer is added. Each leaf node has a pointer to the next adjacent leaf node, as shown in the figure. Mainly to improve the performance of interval access. For example, if you want to find all the data with key 20 to 50, you only need to access all data nodes at once according to the sequential access route.
B-tree diagram with sequential access
Locality principle and disk read-ahead
So why Database systems generally use B-trees as index structures, instead of other structures such as red-black trees?
First of all, let’s introduce the principle of locality and the concept of disk read-ahead.
Generally speaking, the index itself is large and will not be stored entirely in memory. It will be stored on disk in the form of an index file. Therefore, disk IO operations will occur during the index search process, and disk IO is very slow compared to memory access, so the index structure should minimize the number of disk IO accesses.
In order to reduce disk IO, the disk often performs data pre-reading, starting from a certain position and pre-reading a certain length of data backwards into the memory, which is the principle of locality. Because disk sequential reading is more efficient and does not require seek time, it can improve IO efficiency.
The read-ahead length is generally an integer multiple of the page, and the main memory and disk exchange data in units of pages. When the data that needs to be read is not in the memory, a page fault interrupt is triggered. The system will send a request to the disk to read the disk data. The disk finds the starting position of the data and continuously reads one or several pages of data backwards and loads them into the memory. Then the interrupt returns and the system continues running. When designing a general database system, the size of the B-tree node is set to one page, so that loading of each node only requires one IO.
The above is the detailed content of The principle of MySQL index. For more information, please follow other related articles on the PHP Chinese website!