index merge的补充说明-Mysql Tutorial-php.cn

在除了前面介绍的常见index merge的案例(Index Merge Union Access Algorithm)之外，还有一类很少见也比较特殊的index merge，多个索引扫描后进行交集，即 Index Merge Intersection。这类执行计划比较少见(因为MySQL需要ROR的原因)，但是，在合适的场景使用，效率仍然会有很大的提示，本文将看看MySQL优化器如何评估和选择此类执行计划。MySQL手册对此只是三言两语简单介绍了一下，这里做个较为详细的说明。

这类执行计划完整名称应该是：The Index Merge Intersection Access Algorithm，下文简称Intersection。

1. 为什么需要考虑Intersection

考虑如下查询:

SELECT COUNT(*) FROM t1 WHERE key1=1 AND key2=1;

Copy after login

优化器可以考虑使用索引key1或者key2进行REF/Range访问，如果使用key1，那么key2=1则作为过滤条件。另外，优化器还会考虑使用Intersection，即同时使用索引key1和key2。这样做可能的好处是：

(a) 如果两次索引扫描后做交集，如果最后ROWID很少，则回表次数大大减少

(b) 如果扫描这两个索引能是覆盖扫描的话，则无需回表

对比ref/range访问方式，index merge需要额外多访问一个索引，ROWID需要做交集，所以需要额外的比较操作。优化器将各自计算ref/range和index merge的成本，然后选择成本较低作为最终的执行计划。

2. MySQL优化器的Intersection

前面描述了Intersection的两个好处，MySQL优化器先使用了一个较为复杂的算法来预估合并后ROWID数量；另外，如果发现有覆盖扫描，则无需回表，则成本会大大减少。

另外，因为index merge通常需要访问两个以上索引，成本通常不抵，MySQL选择Intersection的时候，加上了一个额外的要求：

(a) 只有ROR类型的索引使用才能作为Intersection执行计划的一部分(什么是ROR)

3. 优化器如何筛选Intersection使用的索引

3.1 算法说明

这里分了两个部分，先使用贪婪算法在所有的ROR索引中，组合出一组成本最小的做Intersection。如果这个“最小组合”不是覆盖索引，而且又存在覆盖索引，那么再做一次贪婪算法找到一个成本最小的覆盖查询，如果成本更小则选择之。

3.1.1 找到成本最小的ROR组合

这是一个贪婪算法，找到未必是全局最优的结果。这里简单描述一下算法(可以参考get_best_ror_intersect的注释和实现)：

初始：R是所有可用的ROR索引查询；S是空集；
R中的记录是按照需要扫描索引的大小排序(E(#records_matched) * key_record_length)
  S= first(R); R= R-first(S);
  min_cost= cost(S); min_scan= make_scan(S);
  while (R is not empty)
  {
    firstR= R - first(R);
    if (!selectivity(S + firstR 
<p>算法说明：每次从所有ROR中取出扫描成本最低的索引，判断加入该索引后成本是否会下降。如果成本下降，则将本ROR加入结果集；如果成本不会下降，那么忽略；</p>
<p>除此，MySQL还做了一个判断，如果新增ROR索引之后，会计算其选择度(selectivity)，只有当新增ROR索引会降低整体区分度的时候，这个索引才会被加入其中。这部分计算的目的，一方面是保证新增索引后一定会降低选择度，这通常都是满足的，只要新增的索引条件不是S集合的子集，一般都是满足的；另一方面，会顺便计算出新增索引后的选择度，这样就可以计算，多个索引合并后返回的记录数大约是多少。下面会单独介绍MySQL如何预估，两个条件交集命中的记录数。</p>
<h5 id="计算两个索引交集命中的记录数">3.1.2 计算两个索引交集命中的记录数</h5>
<p>这个问题的抽象如下：有如下条件key1_p1=c1 and key1_p1=c2 and key2_p1=c3 and key2_p2=c4，现在已知key1_p1=c1 and key1_p1=c2的选择度是X，key2_p1=c3 and key2_p2=c4的选择度是Y，问，总体选择度是多少？</p>
<p>如果key1和key2是完全独立的，没有任何字段重复，那么按照均匀计算，交集后，总体选择度为X*Y，这部分是较为容易理解的。</p>
<p>如果key1和key2不是独立的，问题就较为复杂了，例如，key1_p1 = c1 和 key2_p1=c3 是两个一样的重复的条件，即索引key1和key2的某个字段相同。那么，如果按照上面的公式计算就非常不准确了。MySQL计算的办法，是逐个添加：</p>
<p>假设有集合A={key1_p1 = c1, key1_p1=c2}，对应的选择度记为P(A)，如果有索引条件：key2_p1=c3 and key2_p2=c4，MySQL先将key2_p1=c3加入集合A，并计算选择度；然后把key2_p2=c4加入集合A，并计算选择度。进一步抽象，有集合A，已知选择度为P(A)，现有索引条件key2对应的两个AND条件为\(b_1\)和\(b_2\)，现在演示如何逐个将\(b_1\)和\(b_2\)加入集合A并计算其选择度。</p>
<p>已知集合A，其选择度为P(A)；索引条件\(b_1\) and \(b_2\)；并记 \(B_1 = \{b_1\}，B_2 = \{b_2\}；\)；</p>
<p>记R为该表总记录数，\(R(b_1)\)表示条件\(b_1\)对应的记录数，可以通过函数records_in_range计算；</p>
<p>\(P(X|Y)\)表示Y条件发生时的条件概率，这里假设都是均匀分布，选择度就是概率。且有P(X|Y) = P(X)*P(Y|X)；</p>
<p>那么，将集合\(B_1\)合并到集合A之后，选择度计算为：</p>
<p>\[P(A\cap B_1) = P(A)*P(B_1|A) \] </p>
<p>(1) 如果A，\(B_1\)不独立，即对应条件\(b_1\)属于集合A，那么，\(P(B_1|A) = 1\)。那么选择度不变，仍然是\(P(A)\);</p>
<p>(2) 如果A，\(B_1\)独立，对应条件\(b_1\)<strong>不</strong>属于集合A，那么有</p>
<p>\[P(A\cap B_1) = P(A)*P(B_1) \] </p>
<p>\[P(B_1) = \frac{R(b_1)}{R}\]</p>
<p>\[P(A\cap B_1) = P(A)*\frac{R(b_1)}{R} \] </p>
<p>这时就可以把条件\(b_1\)并入集合A，对应的选择度如上式。继续，考虑把条件\(b_2\)加入合计A。</p>
<p>\[P((A \cap B_1) \cap B_2) =  P(A)*\frac{R(b_1)}{R}*P(B_2|A \cap B_1)   \]</p>
<p>同样的，如果\(B_2\)和\(A \cap B_1\)不独立，即\(B_2\)是\(\{x|x \in A 或者 x \in B_1 \}\)的子集，那么</p>
<p>\[P(B_2|A \cap B_1) = 1\]</p>
<p>\[P((A \cap B_1) \cap B_2) =  P(A)*\frac{R(b_1)}{R} \]</p>
<p>如果两者独立，继续计算：</p>
<p>\[P(B_2|A \cap B_1) = P(B_2) = \frac{R(b_1 and b_2)}{R(b_1)} \]</p>
<p>\[P((A \cap B_1) \cap B_2) = P(A)*\frac{R(b_1)}{R} * \frac{R(b_1 and b_2)}{R(b_1)} = P(A)*\frac{R(b_1 and b_2)}{R} \]</p>
<p>MySQL将使用上面的方法计算多个条件合并的时候的选择度。 MySQL通过records_in_range来计算\(R(b_1 and b_2)\)。</p>
<p>MySQL在实现的时候，略有不同的地方是，为了尽可能少的避免records_in_range的调用次数，如果连续的多个条件都是同时独立或者同时都不独立，那么则会将这多个条件作为一个整理来计算。</p>
<h5 id="找到成本最小覆盖索引组合">3.1.3 找到成本最小覆盖索引组合</h5>
<p>如果前面找到ROR组合不是覆盖查询，而且又存多个索引组合的覆盖索引的话，MySQL还会再做一次贪婪查找，尝试找到最优的覆盖索引组合，如果成本比之前的"最小成本"更小，则选择这组索引。</p>
<p>这部分实现参考函数get_best_covering_ror_intersect，没有特别需要说明的。</p>
<h3 id="成本的计算">4. 成本的计算</h3>
<p>如果上面计算好了选择度，<strong>Intersection</strong>的成本计算就很简单了。每次新增一个索引到index merge中的时候，先计算各个索引读取的成本(参考)，如果不是覆盖扫描则需要额外加上，根据ROWID取出记录的成本(参考)。</p>
<h3 id="Intersection的案例">5. Intersection的案例</h3>
<pre class="brush:php;toolbar:false">CREATE TABLE `tmp_index_merge` (
  `id` int(11) NOT NULL,
  `key1_part1` int(11) NOT NULL,
  `key1_part2` int(11) NOT NULL,
  `key2_part1` int(11) NOT NULL,
  `key2_part2` int(11) NOT NULL,
  `key2_part3` int(11) NOT NULL,
  `key3_part1` int(11) NOT NULL DEFAULT '4',
  PRIMARY KEY (`id`),
  KEY `ind2` (`key2_part1`,`key2_part2`,`key2_part3`),
  KEY `ind1` (`key1_part1`,`key1_part2`,`id`),
  KEY `ind3` (`key3_part1`,`id`)
) ENGINE=InnoDB
for i in `seq 1 5000` ; do mysql -vvv -uroot test \
-e 'insert into tmp_index_merge values (60000*rand(),5000*rand(),\
*rand(),5000*rand(),5000*rand(),5000*rand(),2877)'; done
for i in `seq 1 5000` ; do mysql -vvv -uroot test \
-e 'insert into tmp_index_merge values (600000*rand(),4333,1657,\
*rand(),5000*rand(),5000*rand(),5000*rand())'; done
explain select count(*) from tmp_index_merge where 
(key1_part1 = 4333 and key1_part2 = 1657) and (key3_part1 = 2877)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tmp_index_merge
         type: index_merge
possible_keys: ind1,ind3
          key: ind3,ind1
      key_len: 4,8
          ref: NULL
         rows: 3622
        Extra: Using intersect(ind3,ind1); Using where; Using index

Copy after login

如果不满足ROR的条件，例如将上面案例的ind3索引的ID字段去掉，则不会再考虑使用Intersection。

alter table tmp_index_merge drop index ind3,add KEY `ind3` (`key3_part1`);
Query OK, 14137 rows affected (1.15 sec)
Records: 14137  Duplicates: 0  Warnings: 0
root@test 04:32:58>explain select * from tmp_index_merge where 
(key1_part1 = 4333 and key1_part2 = 1657) and (key3_part1 = 2877)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tmp_index_merge
         type: ref
possible_keys: ind1,ind3
          key: ind1
      key_len: 8
          ref: const,const
         rows: 3408
        Extra: Using where

Copy after login

6. 最后

Intersection这类执行计划，因为需要满足ROR条件，所以较为少见。理想情况是，覆盖但非ROR成本也可能会很低，但是MySQL不考虑这点。另外，较新版本开始支持Index Condition Pushdown，这会大大降低选择ref/range的执行成本，Intersection的优势会大大下降。

到此，MySQL index merge调研就告一段落了。

原文地址：index merge的补充说明, 感谢原作者分享。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks ago By DDD

Roblox: Dead Rails - How To Tame Wolves

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks ago By DDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1659

CakePHP Tutorial

1415

Laravel Tutorial

1310

PHP Tutorial

1258

C# Tutorial

1232

Related knowledge

What file is index.html? Feb 19, 2024 pm 01:36 PM

index.html represents the home page file of the web page and is the default page of the website. When a user visits a website, the index.html page is usually loaded first. HTML (HypertextMarkupLanguage) is a markup language used to create web pages, and index.html is also an HTML file. It contains the structure and content of a web page, as well as tags and elements used for formatting and layout. Here is an example index.html code: &lt

Detailed introduction to what wapi is Jan 07, 2024 pm 09:14 PM

Users may have seen the term wapi when using the Internet, but for some people they definitely don’t know what wapi is. The following is a detailed introduction to help those who don’t know to understand. What is wapi: Answer: wapi is the infrastructure for wireless LAN authentication and confidentiality. This is like functions such as infrared and Bluetooth, which are generally covered near places such as office buildings. Basically they are owned by a small department, so the scope of this function is only a few kilometers. Related introduction to wapi: 1. Wapi is a transmission protocol in wireless LAN. 2. This technology can avoid the problems of narrow-band communication and enable better communication. 3. Only one code is needed to transmit the signal

Detailed explanation of whether win11 can run PUBG game Jan 06, 2024 pm 07:17 PM

Pubg, also known as PlayerUnknown's Battlegrounds, is a very classic shooting battle royale game that has attracted a lot of players since its popularity in 2016. After the recent launch of win11 system, many players want to play it on win11. Let's follow the editor to see if win11 can play pubg. Can win11 play pubg? Answer: Win11 can play pubg. 1. At the beginning of win11, because win11 needed to enable tpm, many players were banned from pubg. 2. However, based on player feedback, Blue Hole has solved this problem, and now you can play pubg normally in win11. 3. If you meet a pub

Introduction to Python functions: Introduction and examples of exec function Nov 03, 2023 pm 02:09 PM

Introduction to Python functions: Introduction and examples of exec function Introduction: In Python, exec is a built-in function that is used to execute Python code stored in a string or file. The exec function provides a way to dynamically execute code, allowing the program to generate, modify, and execute code as needed during runtime. This article will introduce how to use the exec function and give some practical code examples. How to use the exec function: The basic syntax of the exec function is as follows: exec

Detailed introduction to whether i5 processor can install win11 Dec 27, 2023 pm 05:03 PM

i5 is a series of processors owned by Intel. It has various versions of the 11th generation i5, and each generation has different performance. Therefore, whether the i5 processor can install win11 depends on which generation of the processor it is. Let’s follow the editor to learn about it separately. Can i5 processor be installed with win11: Answer: i5 processor can be installed with win11. 1. The eighth-generation and subsequent i51, eighth-generation and subsequent i5 processors can meet Microsoft’s minimum configuration requirements. 2. Therefore, we only need to enter the Microsoft website and download a "Win11 Installation Assistant" 3. After the download is completed, run the installation assistant and follow the prompts to install Win11. 2. i51 before the eighth generation and after the eighth generation

Introducing the latest Win 11 sound tuning method Jan 08, 2024 pm 06:41 PM

After updating to the latest win11, many users find that the sound of their system has changed slightly, but they don’t know how to adjust it. So today, this site brings you an introduction to the latest win11 sound adjustment method for your computer. It is not difficult to operate. And the choices are diverse, come and download and try them out. How to adjust the sound of the latest computer system Windows 11 1. First, right-click the sound icon in the lower right corner of the desktop and select "Playback Settings". 2. Then enter settings and click "Speaker" in the playback bar. 3. Then click "Properties" on the lower right. 4. Click the "Enhance" option bar in the properties. 5. At this time, if the √ in front of "Disable all sound effects" is checked, cancel it. 6. After that, you can select the sound effects below to set and click

Introduction to edge shortcut keys Jul 12, 2023 pm 05:57 PM

In today's fast life, in order to improve work efficiency, shortcut keys are an essential work requirement. A shortcut key is a key or key combination that provides an alternative way to perform an action normally performed using a mouse. So what are the edge shortcut keys? What are the functions of edge shortcut keys? The editor below has compiled an introduction to edge shortcut keys. Friends who are interested should come and take a look! Ctrl+D: Add the current page to favorites or reading list Ctrl+E: Perform a search query in the address bar Ctrl+F: Find on the page Ctrl+H: Open the history panel Ctrl+G: Open the reading list panel Ctrl +I: Open the favorites list panel (the test does not seem to work) Ctrl+J: Open

What is Dogecoin Apr 01, 2024 pm 04:46 PM

Dogecoin is a cryptocurrency created based on Internet memes, with no fixed supply cap, fast transaction times, low transaction fees, and a large meme community. Uses include small transactions, tips, and charitable donations. However, its unlimited supply, market volatility, and status as a joke coin also bring risks and concerns. What is Dogecoin? Dogecoin is a cryptocurrency created based on internet memes and jokes. Origin and History: Dogecoin was created in December 2013 by two software engineers, Billy Markus and Jackson Palmer. Inspired by the then-popular "Doge" meme, a comical photo featuring a Shiba Inu with broken English. Features and Benefits: Unlimited Supply: Unlike other cryptocurrencies such as Bitcoin

See all articles