This article is an advanced study of MySQL. It will give you a detailed understanding of how to create a more suitable index. I hope it will be helpful to you!
Don’t know the importance of indexing when there is more data in the library, let alone know the appropriate index when there is more data in the library. Index Importance. This article introduces how to create efficient and appropriate index. [Related recommendations: mysql video tutorial]
As shown below, the results of the two SQLs are the same, but the execution plans of the two SQLs are different. The efficiency of index in type is far less than that of actor_id 4 in the const where condition. The expression affects the execution plan, and the meaning of type representation Please refer to explain for detailed explanation
All our tables basically have primary keys, so in normal development, use indexes if they can be used, and use primary key indexes if they can be used.
Many times our indexes are actually strings, and long strings will inevitably appear, which will cause the index to take up too much space and reduce its efficiency. Especially for long columns such as blob, text, and varchar. At this time, the way to deal with it is not to use the full value of the field as the index, but only to take the first half (the selectivity of the selected prefix index is close to the entire column). This can greatly reduce the index space, thus improving efficiency. The disadvantage is that it reduces the selectivity of the index.
Index selectivity: the ratio of unique index values to the total number of data table records (#T), ranging from 1/#T to 1. The higher the selectivity of the index, the higher the query efficiency, because the data is highly differentiated and more rows can be filtered out. The selectivity of the unique index is 1 and its performance is the best.
For example, in the email field in the company's employee table, the email suffix of a company is the same, such as xxxx@qq.com
. In fact, the only effective way to use email as an index is the xxxx part. , because @qq.com is the same, it is meaningless for the index. Obviously only xxxx is used as the index, and its selectivity is the same as the entire value, but using xxxx as the index will obviously reduce the index space.
Below we use the employee table as an example (see the end of the article for table structure and data)
We take the email field as an example to create an index:
The email address of this data is actually a mobile phone number @qq.com Take this as an example. In fact, the first 11 digits and the following digits are all the same. I use the following sql to see the selectivity calculation of these data (take the first 10, 11, 12 respectively).
-- 当是11个前缀的时候选择性是1,在增加字段长度,选择性也不会变化 select count(distinct left(email,10))/count(*) as e10, count(distinct left(email,11))/count(*) as e11, count(distinctleft(email,12))/count(*) as e12 from employee;
From the above figure we can see that the selectivity of the top 10, top 11, and top 12 are 0.14, 1.0, 1.0 respectively, and the index selectivity at the 11th position Is the highest 1, there is no need to use them all as indexes, which increases the index space.
-- 创建前缀索引 alter table employee add key(email(11));
We can also use count to calculate frequency for statistics (the fewer the occurrences, the lower the repetition rate and the greater the selectivity)
-- 查找前缀出现的频率 select count(*) as cnt,left(email,11) as pref from employee group by pref order by cnt desc limit 10;
We often have the need to sort, use order by, but order by affects performance more, it sorts by loading data into memory, if the amount of data is large It cannot be stored in the memory and can only be processed in multiple batches. However, the index itself is ordered, and it is easier to complete the sorting directly through the index.
Scanning the index itself is fast because you only need to move from one index record to the next record, but if the index cannot cover all the columns required for the query, you have to scan the index every time The record is returned to the table and the corresponding row is queried once. This is basically random IO. Therefore reading data in index order is usually slower than a sequential full table scan.
Mysql can use the same index to satisfy sorting and find rows. Please consider creating such an index if possible.
Only when the order of the index columns is completely consistent with the order of the order by clause, and the sorting direction (backward or forward) of all columns is the same, mysql can use the index to sort the results. If the query needs to be related to multiple tables, index sorting can only be used when the fields in the order by clause are all from the first table. The order by query also needs to satisfy the leftmost prefix of the combined index, otherwise index sorting cannot be used.
In fact, there are two main points to pay attention to during development:
As above, union all will be executed twice, And in and or only happen once. At the same time, it can be seen that the execution plans of or and in are the same,\
but we are looking at their execution time. As shown below, use set profiling=1
to see the detailed time, and use show profiles
to view the specific time. The following figure shows that the time of or is 0.00612000 and the time of in is 0.00022800. The gap is still very large (the test table data only has 200 rows)
union all: Query It is divided into two stages. In fact, there is also a union. When union must be used in daily development, it is recommended to use union all, because union has an additional step of distinct deduplication. So try to use union all.
range conditions:>,>=,<,<=,between
Range column Indexes can be used, but columns following the range column cannot use the index (the index can be used for at most one range column)
For example, a combined index age name if the query condition iswhere age>18 and The name after name="Ji"
is not used for indexing.
I was once asked in an interview whether it means that I can follow a certain index. I have never paid attention to it or answered it successfully. This time I did an experiment myself. Please see the end of the article for the conclusion.
I defined it in the employee tablemobile
The fields are of varchar type and indexed. I use numbers respectively. And string query.
Look at the results: the two types are different, and only strings use indexes.
If the type of the value of the condition is inconsistent with the one defined in the table, then mysql will force type conversion, but the result will not be indexed. During the development of the index, we need to enter the corresponding type according to the type defined by ourselves. value.
That is, the indexed fields should not be empty as much as possible. There may be some unexpected problems, but in actual work, it is unlikely that it will not be empty. So handle it according to the actual business and try to avoid this situation.
Table connection is actually a circular nested matching of multiple tables, which affects performance, and requires join. The field data types must be consistent to improve query efficiency. Let’s write a special article about the principle of table connection later.
The function of limit is not just paging, its essential function is to limit output.
limit actually traverses the query data one by one. If you only need one piece of data to add the limit of limit 1
, then the index pointer will stop after finding the data that meets the conditions, and will not continue to judge downward. Yes, return directly. If there is no limit, the judgment will continue.
But if you paging and fetch 5 items after 10,000 items, limit 10000,10005
, you need to be careful. It will traverse 10,000 items and fetch 5 items, which is very inefficient. Tip: If the primary key is sequential, you can get the paging data directly through the primary key.
There is a price to build/maintain the index and it also takes up space. The more indexes, the better. Indexes must be used rationally.
The more fields there are, the larger the index will be and the more storage space it will occupy.
The more indexes the better, and the indexes do not need to be designed when you start to build the table. Premature optimization will not be an efficient index. You need to understand the business and based on the relevant The business sql must be statistically weighed before building relevant indexes. In this way, the considerations will be more comprehensive and the indexes established will be more effective and efficient.
The above are the small details corresponding to index optimization. I hope it can help you write smooth sql.
Supplement
About the question of whether to index or not
结论:只有主键会走,唯一键和普通索引都不会走。
我在employee表中建了唯一索引employee_num
和联合索引employee_num+name
,结果就是下图的执行情况。
employee表结构
CREATE TABLE `employee` ( `employee_id` bigint(20) NOT NULL AUTO_INCREMENT, `employee_num` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL COMMENT '员工编码', `name` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL COMMENT '员工姓名', `email` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL COMMENT '电子邮件', `mobile` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL COMMENT '移动电话', `gender` tinyint(1) NOT NULL COMMENT '性别, 0: 男 1: 女', PRIMARY KEY (`employee_id`) USING BTREE, INDEX `email`(`email`(11)) USING BTREE, INDEX `employee_u1`(`employee_num`, `name`) USING BTREE, UNIQUE INDEX `employee_u2`(`employee_num`) USING BTREE, INDEX `employee_u3`(`mobile`) USING BTREE ) ENGINE = InnoDB AUTO_INCREMENT = 0 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci COMMENT = '员工表' ROW_FORMAT = Dynamic;
employee数据如下:
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (10, '001', '员工A', '15500000001@qq.com', '15500000001', 1); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (11, '002', '员工B', '15500000002@qq.com', '15500000002', 0); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (12, '003', '员工C', '15500000003@qq.com', '15500000003', 0); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (13, '004', '员工D', '15500000004@qq.com', '15500000004', 0); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (14, '005', '员工E', '15500000005@qq.com', '15500000005', 1); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (15, '006', '员工F', '15500000006@qq.com', '15500000006', 1); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (16, '007', '员工G', '15500000007@qq.com', '15500000007', 0); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (17, '008', '员工H', '15500000008@qq.com', '15500000008', 1); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (18, '009', '员工I', '15500000009@qq.com', '15500000009', 1); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (19, '010', '员工J', '15500000010@qq.com', '15500000010', 1); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (20, '011', '员工K', '15500000011@qq.com', '15500000011', 1); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (21, '012', '员工L', '15500000012@qq.com', '15500000012', 1); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (22, '013', '员工M', '15500000013@qq.com', '15500000013', 0); INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (23, '014', '员工N', '15500000014@qq.com', '15500000014', 1);
更多编程相关知识,请访问:编程视频!!
The above is the detailed content of MySQL Advanced Learning: Detailed explanation of how to create efficient and appropriate indexes. For more information, please follow other related articles on the PHP Chinese website!