探究MySQL优化器对索引和JOIN顺序的选择

Home

Database

Mysql Tutorial

探究MySQL优化器对索引和JOIN顺序的选择_MySQL

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 01, 2016 pm 01:00 PM

mysql optimizer

本文通过一个案例来看看MySQL优化器如何选择索引和JOIN顺序。表结构和数据准备参考本文最后部分"测试环境"。这里主要介绍MySQL优化器的主要执行流程，而不是介绍一个优化器的各个组件(这是另一个话题)。

我们知道，MySQL优化器只有两个自由度：顺序选择；单表访问方式；这里将详细剖析下面的SQL，看看MySQL优化器如何做出每一步的选择。

explain
select *
from
 employee as A,department as B
where
   A.LastName = 'zhou'
 and B.DepartmentID = A.DepartmentID
 and B.DepartmentName = 'TBX';

Copy after login

1. 可能的选择

这里看到JOIN的顺序可以是A|B或者B|A，单表访问方式也有多种，对于A表可以选择：全表扫描和索引`IND_L_D`(A.LastName = 'zhou')或者`IND_DID`(B.DepartmentID = A.DepartmentID)。对于B也有三个选择：全表扫描、索引IND_D、IND_DN。
2. MySQL优化器如何做
2.1 概述

MySQL优化器主要工作包括以下几部分：Query Rewrite(包括Outer Join转换等)、const table detection、range analysis、JOIN optimization(顺序和访问方式选择)、plan refinement。这个案例从range analysis开始。
2.2 range analysis

这部分包括所有Range和index merge成本评估(参考1 参考2)。这里，等值表达式也是一个range，所以这里会评估其成本，计算出found records(表示对应的等值表达式，大概会选择出多少条记录)。

本案例中，range analysis会针对A表的条件A.LastName = 'zhou'和B表的B.DepartmentName = 'TBX'分别做分析。其中：

表A A.LastName = 'zhou' found records: 51
表B B.DepartmentName = 'TBX' found records: 1

这两个条件都不是range，但是这里计算的值仍然会存储，在后面的ref访问方式评估的时候使用。这里的值是根据records_in_range接口返回，而对于InnoDB每次调用这个函数都会进行一次索引页的采样，这是一个很消耗性能的操作，对于很多其他的关系数据库是使用"直方图"的统计数据来避免这次操作(相信MariaDB后续版本也将实现直方图统计信息)。
2.3 顺序和访问方式的选择：穷举

MySQL通过枚举所有的left-deep树(也可以说所有的left-deep树就是整个MySQL优化器的搜索空间)，来找到最优的执行顺序和访问方式。
2.3.1 排序

优化器先根据found records对所有表进行一个排序，记录少的放前面。所以，这里顺序是B、A。
2.3.2 greedy search

当表的数量较少(少于search_depth，默认是63)的时候，这里直接蜕化为一个穷举搜索，优化器将穷举所有的left-deep树找到最优的执行计划。另外，优化器为了减少因为搜索空间庞大带来巨大的穷举消耗，所以使用了一个"偷懒"的参数prune_level(默认打开)，具体如何"偷懒"，可以参考JOIN顺序选择的复杂度。不过至少需要有三个表以上的关联才会有"偷懒"，所以本案例不适用。
2.3.3 穷举

JOIN的第一个表可以是：A或者B；如果第一个表选择了A，第二个表可以选择B；如果第一个表选择了B，第二个表可以选择A；

因为前面的排序，B表的found records更少，所以JOIN顺序穷举时的第一个表先选择B(这个是有讲究的)。

(*) 选择第一个JOIN的表为B
(**) 确定B表的访问方式
因为B表为第一个表，所以无法使用索引IND_D(B.DepartmentID = A.DepartmentID)，而只能使用IND_DN(B.DepartmentName = 'TBX')
使用IND_DN索引的成本计算：1.2；其中IO成本为1。
是否使用全表扫描：这里会比较使用索引的IO成本和全表扫描的IO成本，前者为1，后者为2；所以忽略全表扫描
所以，B表的访问方式ref，使用索引IND_D

(**) 从剩余的表中穷举选出第二个JOIN的表，这里剩余的表为：A
(**) 将A表加入JOIN，并确定其访问方式
可以使用的索引为：`IND_L_D`(A.LastName = 'zhou')或者`IND_DID`(B.DepartmentID = A.DepartmentID)
依次计算使用索引IND_L_D、IND_DID的成本：
(***) IND_L_D A.LastName = 'zhou'
在range analysis阶段给出了A.LastName = 'zhou'对应的记录约为：51。
所以，计算IO成本为：51；ref做IO成本计算时会做一次修正，将其修正为worst_seek(参考)
修正后IO成本为：15，总成本为：25.2
(***) IND_DID B.DepartmentID = A.DepartmentID
这是一个需要知道前面表的结果，才能计算的成本。所以range analysis是无法分析的
这里，我们看到前面表为B，found_record是1，所以A.DepartmentID只需要对应一条记录就可以了
因为具体取值不知道，也没有直方图，所以只能简单依据索引统计信息来计算：
索引IND_DID的列A.DepartmentID的Cardinality为1349，全表记录数为1349
所以，每一个值对应一条记录，而前面表B只有一条记录，所以这里的found_record计算为1*1 = 1
所以IO成本为：1，总成本为1.2
(***) IND_L_D成本为25.2；IND_DID成本为1.2，所以选择后者为当前表的访问方式
(**) 确定A使用索引IND_DID，访问方式为ref
(**) JOIN顺序B|A，总成本为：1.2+1.2 = 2.4

(*) 选择第一个JOIN的表为A
(**) 确定A表的访问方式
因为A表是第一个表，所以无法使用索引`IND_DID`(B.DepartmentID = A.DepartmentID)
那么只能使用索引`IND_L_D`(A.LastName = 'zhou')
使用IND_L_D索引的成本计算，总成本为25.2；参考前面计算；
(**) 这里访问A表的成本已经是25.2，比之前的最优成本2.4要大，忽略该顺序
所以，这次穷举搜索到此结束

把上面的过程简化如下：

(*) 选择第一个JOIN的表为B
(**) 确定B表的访问方式
(**) 从剩余的表中穷举选出第二个JOIN的表，这里剩余的表为：A
(**) 将A表加入JOIN，并确定其访问方式
(***) IND_L_D A.LastName = 'zhou'
(***) IND_DID B.DepartmentID = A.DepartmentID
(***) IND_L_D成本为25.2；IND_DID成本为1.2，所以选择后者为当前表的访问方式
(**) 确定A使用索引IND_DID，访问方式为ref
(**) JOIN顺序B|A，总成本为：1.2+1.2 = 2.4

(*) 选择第一个JOIN的表为A
(**) 确定A表的访问方式
(**) 这里访问A表的成本已经是25.2，比之前的最优成本2.4要大，忽略该顺序

至此，MySQL优化器就确定了所有表的最佳JOIN顺序和访问方式。
3. 测试环境

MySQL: 5.1.48-debug-log innodb plugin 1.0.9

CREATE TABLE `department` (
 `DepartmentID` int(11) DEFAULT NULL,
 `DepartmentName` varchar(20) DEFAULT NULL,
 KEY `IND_D` (`DepartmentID`),
 KEY `IND_DN` (`DepartmentName`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk;

CREATE TABLE `employee` (
 `LastName` varchar(20) DEFAULT NULL,
 `DepartmentID` int(11) DEFAULT NULL,
 KEY `IND_L_D` (`LastName`),
 KEY `IND_DID` (`DepartmentID`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk;

for i in `seq 1 1000` ; do mysql -vvv -uroot test -e 'insert into department values (600000*rand(),repeat(char(65+rand()*58),rand()*20))'; done
for i in `seq 1 1000` ; do mysql -vvv -uroot test -e 'insert into employee values (repeat(char(65+rand()*58),rand()*20),600000*rand())'; done

for i in `seq 1 50` ; do mysql -vvv -uroot test -e 'insert into employee values ("zhou",27760)'; done
for i in `seq 1 200` ; do mysql -vvv -uroot test -e 'insert into employee values (repeat(char(65+rand()*58),rand()*20),27760)'; done
for i in `seq 1 1` ; do mysql -vvv -uroot test -e 'insert into department values (27760,"TBX")'; done

show index from employee;
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| employee |     1 | IND_L_D |      1 | LastName   | A     |    1349 |   NULL | NULL  | YES | BTREE   |     |
| employee |     1 | IND_DID |      1 | DepartmentID | A     |    1349 |   NULL | NULL  | YES | BTREE   |     |
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+

show index from department;
+------------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| department |     1 | IND_D  |      1 | DepartmentID  | A     |    1001 |   NULL | NULL  | YES | BTREE   |     |
| department |     1 | IND_DN  |      1 | DepartmentName | A     |    1001 |   NULL | NULL  | YES | BTREE   |     |
+------------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+

Copy after login

4. 构造一个Bad case

因为关联条件中MySQL使用索引统计信息做成本预估，所以数据分布不均匀的时候，就容易做出错误的判断。简单的我们构造下面的案例：

表和索引结构不变，按照下面的方式构造数据：

for i in `seq 1 10000` ; do mysql -uroot test -e 'insert into department values (600000*rand(),repeat(char(65+rand()*58),rand()*20))'; done
for i in `seq 1 10000` ; do mysql -uroot test -e 'insert into employee values (repeat(char(65+rand()*58),rand()*20),600000*rand())'; done

for i in `seq 1 1` ; do mysql -uroot test -e 'insert into employee values ("zhou",27760)'; done
for i in `seq 1 10` ; do mysql -uroot test -e 'insert into department values (27760,"TBX")'; done
for i in `seq 1 1000` ; do mysql -uroot test -e 'insert into department values (27760,repeat(char(65+rand()*58),rand()*20))';
done

explain
select *
from
 employee as A,department as B
where
   A.LastName = 'zhou'
 and B.DepartmentID = A.DepartmentID
 and B.DepartmentName = 'TBX';
+----+-------------+-------+------+-----------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys  | key   | key_len | ref         | rows | Extra    |
+----+-------------+-------+------+-----------------+---------+---------+---------------------+------+-------------+
| 1 | SIMPLE   | A   | ref | IND_L_D,IND_DID | IND_L_D | 43   | const        |  1 | Using where |
| 1 | SIMPLE   | B   | ref | IND_D,IND_DN  | IND_D  | 5    | test.A.DepartmentID |  1 | Using where |
+----+-------------+-------+------+-----------------+---------+---------+---------------------+------+-------------+

Copy after login

可以看到这里，MySQL执行计划对表department使用了索引IND_D，那么A表命中一条记录为(zhou,27760)；根据B.DepartmentID=27760将返回1010条记录，然后根据条件DepartmentName = 'TBX'进行过滤。

这里可以看到如果B表选择索引IND_DN，效果要更好，因为DepartmentName = 'TBX'仅仅返回10条记录，再根据条件A.DepartmentID=B.DepartmentID过滤之。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7632

CakePHP Tutorial

1390

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

143

Related knowledge

How to open phpmyadmin Apr 10, 2025 pm 10:51 PM

You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

MySQL: An Introduction to the World's Most Popular Database Apr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

How to use single threaded redis Apr 10, 2025 pm 07:12 PM

Redis uses a single threaded architecture to provide high performance, simplicity, and consistency. It utilizes I/O multiplexing, event loops, non-blocking I/O, and shared memory to improve concurrency, but with limitations of concurrency limitations, single point of failure, and unsuitable for write-intensive workloads.

MySQL's Place: Databases and Programming Apr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

Why Use MySQL? Benefits and Advantages Apr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

How to connect to the database of apache Apr 13, 2025 pm 01:03 PM

Apache connects to a database requires the following steps: Install the database driver. Configure the web.xml file to create a connection pool. Create a JDBC data source and specify the connection settings. Use the JDBC API to access the database from Java code, including getting connections, creating statements, binding parameters, executing queries or updates, and processing results.

Monitor Redis Droplet with Redis Exporter Service Apr 10, 2025 pm 01:36 PM

Effective monitoring of Redis databases is critical to maintaining optimal performance, identifying potential bottlenecks, and ensuring overall system reliability. Redis Exporter Service is a powerful utility designed to monitor Redis databases using Prometheus. This tutorial will guide you through the complete setup and configuration of Redis Exporter Service, ensuring you seamlessly build monitoring solutions. By studying this tutorial, you will achieve fully operational monitoring settings

How to view sql database error Apr 10, 2025 pm 12:09 PM

The methods for viewing SQL database errors are: 1. View error messages directly; 2. Use SHOW ERRORS and SHOW WARNINGS commands; 3. Access the error log; 4. Use error codes to find the cause of the error; 5. Check the database connection and query syntax; 6. Use debugging tools.

See all articles