1. What is a clustered index
Database indexes can be divided into different types from different perspectives, and clustered index is one of them.
Clustered index is Clustered Index in English. Sometimes you may see some people refer to it as clustered index, etc. The opposite is non-clustered index or secondary index.
Clustered index is not a separate index type, but a way of storing data. In MySQL's InnoDB storage engine, the so-called clustered index actually saves the index and data rows in the same B Tree: At this time, the data is placed in the leaf nodes, clustered, clustered, which means Said data rows and corresponding key values exist compactly together.
Suppose I have the following data:
id (primary key) |
username |
age |
address |
gender |
1 |
ab |
99 |
Shenzhen |
男 |
##2 | ac | 98 | 广州 | 男 |
3 | af | 88 | 北京 | 女 |
4 | bc | 80 | 上海 | 女 |
5 | bg | 85 | Chongqing | 女 |
6 | bw | 95 | Tianjin | Male |
7 | bw | 99 | Haikou | 女 |
8 | cc | 92 | 武汉 | Male |
9 | ck | 90 | Shenzhen | Male |
10 | cx | 93 | Shenzhen | Male |
Then its clustered index probably looks like this:
# Then you can see that there are both primary key values (indexes) and data rows on the leaves. There are only primary key values (indexes) on the node.
Think about it, friends, the data in the MySQL table can only be saved in one copy on the disk, and it is impossible to save two copies. Therefore, in a table, there can only be one clustered index, and it is impossible to have one. Multiple.
2. Clustered index and primary key
Some friends are not clear about the relationship between the two, and even equate the two. This is a huge misunderstanding.
In some databases, developers are allowed to freely choose which index to use as a clustered index, but MySQL does not support this feature.
In MySQL, if the table itself has a primary key set, then the primary key is the clustered index; if the table itself does not set a primary key, a unique and non-empty index in the table will be selected as the clustered index. ; If there is no unique non-empty index in the table, the implicit primary key in the table will be automatically selected as the clustered index. Brother Song will introduce you to the implicit primary key of MySQL tables in future articles.
However, generally speaking, it is recommended that you set the primary key for the table yourself, because the implicit primary key is auto-incrementing, and there is a problem with auto-incrementing: there will be a very high auto-increment value. For lock competition issues, the upper bound of the primary key is called hot data. Because all insertion operations require the primary key to be incremented and cannot be repeated, lock competition will occur and performance will decrease.
Based on the above introduction, we can summarize the relationship between clustered index and primary key index in MySQL as follows:
3. Clustered index advantages and disadvantages
Let’s talk about the advantages first:
We can combine interrelated data Keep it together. For example, there is a user order table. We can aggregate all data based on User ID Order ID. User IDs may be repeated, but order IDs will not be repeated. In this way, we can aggregate all order data related to a user. Saved together, if you need to query all orders of a user, it will be very fast and only require a small amount of disk IO.
No need to return the table, so data access is faster. In a clustered index, both the index and the data are on the same B Tree, so retrieving data from the clustered index is faster than retrieving data from a non-clustered index (non-clustered indexes require table backing).
For the first case, if we want to query all the order IDs of this user based on the user ID, then there is no need to go to the leaf node at this time, because the branch node There is the data we need, so we can directly use the characteristics of the covering index to read the required data.
These are some common advantages of clustered indexes. In fact, we should make full use of these advantages in daily table design.
Let’s take a look at the shortcomings:
My friends found that the advantage of the clustered index we mentioned earlier is mainly that the clustered index reduces the number of IOs, thereby improving The performance of the database, but some IO-intensive applications may directly load a large enough memory to read all the data into the memory for operation. In this case, the clustered index has no advantage.
Random primary keys will cause page splitting problems. If the primary keys are inserted sequentially, it will be relatively more efficient, because in B Tree, you only need to keep appending to the back; but if the primary key If it is a non-sequential insertion, the efficiency will be much lower because page splitting may be involved. Taking the picture above as an example, assuming that each node can save three pieces of data, and now we want to insert a record with a primary key of 4.5, then we need to move the value of the primary key of 5 back, which will cause the node with a primary key of 8 to also Move back. Page splitting results in less efficient data insertion and takes up more storage space.
When querying non-clustered index (secondary index), you need to return the table. Because an index is an index tree, and the data is all on the clustered index, so if you use a non-clustered index to search, the leaves of the non-clustered index store the primary key value. Find the primary key value first, and then hold The primary key value is then searched on the clustered index, so that a total of two index trees are queried, which is the table return.
The above is the detailed content of What are the advantages and disadvantages of MySQL clustered index. For more information, please follow other related articles on the PHP Chinese website!