As the data captured by crawlers continues to increase, the database and query statements have been continuously optimized in the past two days. One of the table structures is as follows:
CREATE TABLE `newspaper_article` ( `id` varchar(50) NOT NULL COMMENT '编号', `title` varchar(190) NOT NULL COMMENT '标题', `author` varchar(255) DEFAULT NULL COMMENT '作者', `date` date NULL DEFAULT NULL COMMENT '发表时间', `content` longtext COMMENT '正文', `status` tinyint(4) DEFAULT '0', PRIMARY KEY (`id`), KEY `idx_status_date` (`status`,`date`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='文章表';
According to business needs, the idx_status_date
index has been added, which is particularly time-consuming when executing the following SQL:
SELECT id, title, status, date FROM article WHERE status > -2 AND date = '2016-01-07';
According to observations, the number of new data added every day is approximately within 2,500. I thought that a specific date was specified here '2016-01-07'
, and the actual amount of data that needs to be scanned should be within 2500, but this is not the case:
A total of 185,589 pieces of data were actually scanned, which was much higher than the estimated 2,500 pieces, and the actual execution time was nearly 3 seconds:
Why is this?
After changing idx_status_date (status, date)
to idx_status (status)
, check the MySQL execution plan:
You can see that after changing the multi-column index to a single-column index, there is no change in the total amount of data to be scanned in the execution plan. Combined with the fact that multi-column indexes follow the leftmost prefix principle, it is speculated that the above query statement only uses the index of the leftmost status
of idx_status_date
.
I flipped through "High Performance MySQL" and found the following passage, which confirmed my idea:
If there is a range query for a certain column in the query, then the right side of All columns cannot be looked up using index optimization. For example, there is a query
WHERE last_name = 'Smith' AND first_name LIKE 'J%' AND dob = '1976-12-23'
. This query can only use the first two columns of the index, because hereLIKE
is a range condition (but the server can use the remaining columns for other purposes). If the number of range query column values is limited, you can replace the range condition by using multiple equal conditions.
Therefore, there are two solutions here:
You can replace the range condition by using multiple equal conditions
Modify idx_status_date (status, date)
to index idx_date_status (date, status)
and create a new idx_status
index to achieve the same effect.
Optimized execution plan:
##Actual execution result:
SummaryWhen people talk about indexes, if they don’t specify the type, they are probably talking about
B-Tree indexes. It uses
B-Tree data structure to store data. We use the term "B-Tree" because MySQL also uses this keyword in
CREATE TABLE and other statements. However, the underlying storage engine may also use different storage structures. InnoDB uses B+Tree.
Suppose there is the following data table:
CREATE TABLE People ( last_name varchar(50) not null, first_name varchar(50) not null, dob date not null, gender enum('m', 'f') not null, key(last_name, first_name, dob) );
Full value matching refers to Match all columns in the index. For example, the index in the above table can be used to find people named Cuba Allen and born on 1960-01-01.
The index in the above table can be used to find all people with the last name of Allen, that is, only the first column of the index is used.
Only matches the beginning of the value of a column. For example, the index in the above table can be used to find all people whose last names begin with J. Only the first column of the index is used here.
For example, the index in the above table can be used to find people with last names between Allen and Barrymore. Only the first column of the index is used here.
The index in the above table can also be used to find all people whose last name is Allen and whose first name starts with the letter K (such as Kim, Karl, etc.) people. That is, the first column last_name matches completely, and the second column first_name matches the range.
B-Tree can usually support "query that only accesses the index", that is, the query only needs to access the index without accessing the data rows.
Columns in the index cannot be skipped. That is, the index on the table above cannot be used to find people with the last name Smith who were born on a specific date. If you do not specify a name (first_name), MySQL can only use the first column of the index.
If there is a range query for a certain column in the query, all columns to the right of it cannot be searched using index optimization. For example, there is a query WHERE last_name = 'Smith' AND first_name LIKE 'J%' AND dob = '1976-12-23'
. This query can only use the first two columns of the index, because here LIKE
is a range condition (but the server can use the remaining columns for other purposes). If the number of range query column values is limited, you can replace the range condition by using multiple equal conditions.
The above is the detailed content of Share a MySQL multi-column index optimization example code. For more information, please follow other related articles on the PHP Chinese website!