Home Database Mysql Tutorial MySQL InnoDB index principles and algorithms

MySQL InnoDB index principles and algorithms

Nov 27, 2019 pm 05:33 PM
innodb mysql Full text index index

Maybe you often use MySQL, and you also often use indexes, but you don’t know the principles and advanced functions of indexes. Let’s learn together here.

MySQL InnoDB index principles and algorithms

<span style="font-size: 20px;">InnoDB</span>Storage index

In the database , if there are too many indexes, application performance may be affected; if there are too few indexes, query performance may be affected. Therefore, we need to pursue a balance point between the two. Enough indexes will improve query performance, but not too many indexes, which will lead to excessive load when modifying data and other operations.

InnoDBSupports 3 common indexes:

●Hash index

B tree Index

●Full-text index

What we will explain in detail next isB tree index and full-text index.

Hash index

Before learning hash index, we first understand some basic knowledge: hash algorithm. The hash algorithm is a commonly used algorithm with a time complexity of O(1). It is not only used in indexes, but also in various database applications.

Hash table

Hash table(Hash Table) Also called hash table, it is composed of direct addressing table Improved.

MySQL InnoDB index principles and algorithms
In this table, U represents the complete set of keywords, K represents the actual keywords, and the array on the right (hash table) Represents a continuous space that can be directly addressed in memory, and the real address of the actual data is stored in the one-way linked list associated with each slot in the hash table.

If the array on the right directly uses the direct addressing table, then there will be a h[K] for each keyword K without repetition. This will cause some problems. If the amount of U data Too large and impractical for the computer's available capacity. And if the proportion of collection K to U is too small, most of the allocated space will be wasted.

So we use a hash table, and we determine the mapping relationship through some functions h(k), so that the discrete data can be distributed as evenly as possible using the slots in the array, but There will be a problem where multiple keywords are mapped to the same slot. This situation is called collision (collision) . The simplest solution is used in the database: linking method (chaining) . That is, each slot stores a single-linked list, and all colliding elements will form a node in the linked list in turn. If it does not exist, the linked list points to NULL.

The function used h(k) becomes the hash function, which must be able to hash well. It is best to avoid collisions or minimize collisions. Generally, in order to better handle hashed keywords, we will convert them into natural numbers, and then implement them through division hashing, multiplication hashing or global hashing. Databases generally use division hashing, that is, when there are m slots, we take modulo m for each key k: h(k) = k % m.

<span style="font-size: 18px;">InnoDB</span>Hash algorithm in storage engine

InnoDBThe storage engine uses a hash algorithm to search the dictionary, the conflict mechanism uses a linked list, and the hash function uses division hashing. For the hash table of the buffer pool, each page in the buffer pool has a chain pointer, pointing to the page with the same hash value. For a division hash, the value of m is a prime number slightly greater than 2 times the number of buffer pool pages. If the current innodb_buffer_pool_size size is 10M, there are a total of 640 16KB pages, and 1280 slots need to be allocated, which is slightly larger than The prime number is 1399, so a hash table with 1399 slots will be allocated to hash the pages in the query buffer pool.

For converting each page into a natural number, each table space has a space_id. What the user wants to query is a continuous 16KB in the space. Page, that is, offset (offset) , InnoDB shifts space_id to the left by 20 bits, plus space_id and offset, that is, K=space_id, and then hashed into each slot using division.

Adaptive hash index

The adaptive hash index is implemented using the above hash table and belongs to the internal mechanism of the database, DBA cannot intervene. It is only very fast for dictionary type searches, but is powerless for range searches, etc., such as:

select * from t where f='100';
Copy after login

We can check the usage of adaptive hash index:

mysql> show engine innodb status\G;
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2019-05-13 23:32:21 7f4875947700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 32 seconds
...
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1226, seg size 1228, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 276671, node heap has 1288 buffer(s)
0.16 hash searches/s, 16.97 non-hash searches/s
Copy after login

We can see The usage of adaptive hashing can be judged by hash searches/non-hash searches in the last line to judge the efficiency of using hash index.

We can use the innodb_adaptive_hash_index parameter to disable or enable this feature, which is enabled by default.

<span style="font-size: 20px;">B </span>Tree index

B Tree index is currently The most commonly used and effective index for searching in relational database systems. Its structure is similar to a binary tree, and data can be quickly found based on key-value pairs. B tree(balance tree) consists of Btree(banlance tree balanced binary tree)and index sequential access method(ISAM: Index Evolved from Sequence Access Method), these are all classic data structures. The MyISAM engine was originally designed with reference to the ISAM data structure.

Basic data structure

To understand B tree data structure, we first understand some basic knowledge.

Binary Search Method

Also known as the half search method, it refers to arranging the data in order, comparing each time with the intermediate value, and performing a skip search. An algorithm that reduces the range by half and quickly finds the target. Its algorithm complexity is log2(n), which is faster than sequential search.

As shown in the figure, searching for 48 from the ordered list only requires 3 steps:

MySQL InnoDB index principles and algorithms

For detailed algorithm, please refer to Binary Search Algorithm.

Binary search tree

The definition of a binary search tree is that in a binary tree, the value of the left subtree is always less than the root key value, and the root key value is always is less than the value of the right subtree. When we search, we start searching from the root every time, and judge whether to continue searching the left subtree or the right subtree based on the comparison result. The search method is very similar to the binary search method.

MySQL InnoDB index principles and algorithms

Balanced Binary Tree

The definition of a binary search tree is very broad and can be constructed arbitrarily, but in extreme cases The query efficiency is the same as sequential search, such as a binary search tree with only left subtree.

MySQL InnoDB index principles and algorithms

If you want to construct a binary search tree with maximum performance, you need the tree to be balanced, that is, a balanced binary tree (because its inventor is G. M. Adelson-Velsky and Evgenii Landis, also known as the AVL tree). It is defined as a binary search tree that must satisfy the maximum height difference of the two subtrees of any node to 1. The balanced binary tree has a relatively better structure, and the best performance requires the establishment of an optimal binary tree. However, due to the high cost of maintaining the tree, a balanced binary tree is generally sufficient.

Balanced binary tree query is very fast, but when the tree changes, it needs to achieve the new balance of the tree through one or more left and right rotations. I won’t talk about it here.

<span style="font-size: 18px;">B </span> Tree

After understanding the basic data structure, let’s take a look B The definition of the implementation of the tree is very complicated. Simply put, it is to add regulations to the B tree:

1. Leaf nodes store data, and non-leaf nodes Store pointers

2. All leaf nodes are recorded in a doubly linked list from left to right

The goal is a balanced search tree designed for disks or other direct access auxiliary devices. In this tree, all records are placed on the leaf nodes of the same layer according to the size of the key value. There are pointers between the leaf nodes to connect (non-continuous storage), forming a doubly linked list. The index node is constructed according to the balanced tree method, and there are pointers pointing to specific leaf nodes for quick search.

The following B When the tree has less data, the height is 2, and each page is fixed to store 4 records, fan out Fixed to 5 (gray part in the picture). Leaf nodes store multiple pieces of data in order to reduce the height of the tree and perform quick searches.

MySQL InnoDB index principles and algorithms

When we insert 28, 70, 95 3 pieces of data, B Because the tree is full of data, pages need to be split. At this time, the height changes to 3, and each page still has 4 records. The doubly linked list is not drawn but still exists. Now it can be seen that it is the prototype of a balanced binary tree.

MySQL InnoDB index principles and algorithms

##InnoDB<span style="font-size: 18px;"></span>B <span style="font-size: 18px;"></span> Tree index

InnoDBB+ 树索引的特点是高扇出性,因此一般树的高度为2~4层,这样我们在查找一条记录时只用I/O 2~4次。当前机械硬盘每秒至少100I/O/s,因此查询时间只需0.02~0.04s

数据库中的B+ 树索引分为聚集索引(clustered index)和辅助索引(secondary index)。它们的区别是叶子节点存放的是否为一整行的完整数据。

聚集索引

聚集索引就是按照每张表的主键(唯一)构造一棵B+ 树,同时叶子节点存放整行的完整数据,因此将叶子节点称为数据页。由于定义了数据的逻辑顺序,聚集索引也能快速的进行范围类型的查询。

聚集索引的叶子节点按照逻辑顺序连续存储,叶子节点内部物理上连续存储,作为最小单元,叶子节点间通过双向指针连接,物理存储上不连续,逻辑存储上连续。

聚集索引能够针对主键进行快速的排序查找和范围查找,由于是双向链表,因此在逆序查找时也非常快。

我们可以通过explain命令来分析MySQL数据库的执行计划:

# 查看表的定义,可以看到id为主键,name为普通列
mysql> show create table dimensionsConf;
| Table          | Create Table     
| dimensionsConf | CREATE TABLE `dimensionsConf` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `remark` varchar(1024) NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `fullindex_remark` (`remark`)
) ENGINE=InnoDB AUTO_INCREMENT=178 DEFAULT CHARSET=utf8 |
1 row in set (0.00 sec)

# 先测试一个非主键的name属性排序并查找,可以看到没有使用到任何索引,且需要filesort(文件排序),这里的rows为输出行数的预估值
mysql> explain select * from dimensionsConf order by name limit 10\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: dimensionsConf
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 57
        Extra: Using filesort
1 row in set (0.00 sec)

# 再测试主键id的排序并查找,此时使用主键索引,在执行计划中没有了filesort操作,这就是聚集索引带来的优化
mysql> explain select * from dimensionsConf order by id limit 10\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: dimensionsConf
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
        Extra: NULL
1 row in set (0.00 sec)

# 再查找根据主键id的范围查找,此时直接根据叶子节点的上层节点就可以快速得到范围,然后读取数据
mysql> explain select * from dimensionsConf where id>10 and id<p><strong>辅助索引</strong></p><p>辅助索引又称非聚集索引,其叶子节点不包含行记录的全部数据,而是包含一个书签<code>(bookmark)</code>,该书签指向对应行数据的聚集索引,告诉<code>InnoDB</code>存储引擎去哪里查找具体的行数据。辅助索引与聚集索引的关系就是结构相似、独立存在,但辅助索引查找非索引数据需要依赖于聚集索引来查找。<br></p><p><img src="https://img.php.cn/upload/image/639/959/687/157484689554534MySQL%20InnoDB%20index%20principles%20and%20algorithms" title="157484689554534MySQL InnoDB index principles and algorithms" alt="MySQL InnoDB index principles and algorithms"></p><p><span   style="max-width:90%"><strong>全文索引</strong></span></p><p>我们通过<code>B+</code> 树索引可以进行前缀查找,如:</p><pre class="brush:php;toolbar:false">select * from blog where content like 'xxx%';
Copy after login

只要为content列添加了B+ 树索引(聚集索引或辅助索引),就可快速查询。但在更多情况下,我们在博客或搜索引擎中需要查询的是某个单词,而不是某个单词开头,如:

select * from blog where content like '%xxx%';
Copy after login

此时如果使用B+ 树索引依然是全表扫描,而全文检索(Full-Text Search)就是将整本书或文章内任意内容检索出来的技术。

倒排索引

全文索引通常使用倒排索引(inverted index)来实现,倒排索引和B+ 树索引都是一种索引结构,它需要将分词(word)存储在一个辅助表(Auxiliary Table)中,为了提高全文检索的并行性能,共有6张辅助表。辅助表中存储了单词和单词在各行记录中位置的映射关系。它分为两种:

  • inverted file index(倒排文件索引),表现为{单词,单词所在文档ID}
  • full inverted index(详细倒排索引),表现为{单词,(单词所在文档ID, 文档中的位置)}

对于这样的一个数据表:

MySQL InnoDB index principles and algorithms

倒排文件索引类型的辅助表存储为:

MySQL InnoDB index principles and algorithms

详细倒排索引类型的辅助表存储为,占用更多空间,也更好的定位数据,比提供更多的搜索特性:

MySQL InnoDB index principles and algorithms

全文检索索引缓存

辅助表是存在与磁盘上的持久化的表,由于磁盘I/O比较慢,因此提供FTS Index Cache(全文检索索引缓存)来提高性能。FTS Index Cache是一个红黑树结构,根据(word, list)排序,在有数据插入时,索引先更新到缓存中,而后InnoDB存储引擎会批量进行更新到辅助表中。

当数据库宕机时,尚未落盘的索引缓存数据会自动读取并存储,配置参数innodb_ft_cache_size控制缓存的大小,默认为32M,提高该值,可以提高全文检索的性能,但在故障时,需要更久的时间恢复。

在删除数据时,InnoDB不会删除索引数据,而是保存在DELETED辅助表中,因此一段时间后,索引会变得非常大,可以通过optimize table命令手动删除无效索引记录。如果需要删除的内容非常多,会影响应用程序的可用性,参数innodb_ft_num_word_optimize控制每次删除的分词数量,默认为2000,用户可以调整该参数来控制删除幅度。

全文检索限制

全文检索存在一个黑名单列表(stopword list),该列表中的词不需要进行索引分词,默认共有36个,如the单词。你可以自行调整:

mysql> select * from information_schema.INNODB_FT_DEFAULT_STOPWORD;
+-------+
| value |
+-------+
| a     |
| about |
| an    |
| are   |
| as    |
| at    |
| be    |
| by    |
| com   |
| de    |
| en    |
| for   |
| from  |
| how   |
| i     |
| in    |
| is    |
| it    |
| la    |
| of    |
| on    |
| or    |
| that  |
| the   |
| this  |
| to    |
| was   |
| what  |
| when  |
| where |
| who   |
| will  |
| with  |
| und   |
| the   |
| www   |
+-------+
36 rows in set (0.00 sec)
Copy after login

其他限制还有:

 ● 每张表只能有一个全文检索索引

 ● 多列组合的全文检索索引必须使用相同的字符集和字符序,不了解的可以参考MySQL乱码的原因和设置UTF8数据格式

 ● 不支持没有单词界定符(delimiter)的语言,如中文、日语、韩语等

全文检索

我们创建一个全文索引:

mysql> create fulltext index fullindex_remark on dimensionsConf(remark);
Query OK, 0 rows affected, 1 warning (0.39 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------+
| Level   | Code | Message                                          |
+---------+------+--------------------------------------------------+
| Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |
+---------+------+--------------------------------------------------+
1 row in set (0.00 sec)
Copy after login

全文检索有两种方法:

 ● 自然语言(Natural Language),默认方法,可省略:(IN NATURAL LANGUAE MODE)

 ● 布尔模式(Boolean Mode)(IN BOOLEAN MODE)

自然语言还支持一种扩展模式,后面加上:(WITH QUERY EXPANSION)

其语法为MATCH()...AGAINST()MATCH指定被查询的列,AGAINST指定何种方法查询。

自然语言检索

mysql> select remark from dimensionsConf where remark like '%baby%';
+-------------------+
| remark            |
+-------------------+
| a baby like panda |
| a baby like panda |
+-------------------+
2 rows in set (0.00 sec)

mysql> select remark from dimensionsConf where match(remark) against('baby' IN NATURAL LANGUAGE MODE);
+-------------------+
| remark            |
+-------------------+
| a baby like panda |
| a baby like panda |
+-------------------+
2 rows in set (0.00 sec)

# 查看下执行计划,使用了全文索引排序
mysql> explain select * from dimensionsConf where match(remark) against('baby');
+----+-------------+----------------+----------+------------------+------------------+---------+------+------+-------------+
| id | select_type | table          | type     | possible_keys    | key              | key_len | ref  | rows | Extra       |
+----+-------------+----------------+----------+------------------+------------------+---------+------+------+-------------+
|  1 | SIMPLE      | dimensionsConf | fulltext | fullindex_remark | fullindex_remark | 0       | NULL |    1 | Using where |
+----+-------------+----------------+----------+------------------+------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
Copy after login

我们也可以查看各行数据的相关性,是一个非负的浮点数,0代表没有相关性:

mysql> select id,remark,match(remark) against('baby') as relevance from dimensionsConf;
+-----+-----------------------+--------------------+
| id  | remark                | relevance          |
+-----+-----------------------+--------------------+
| 106 | c                     |                  0 |
| 111 | 运营商             |                  0 |
| 115 | a baby like panda     | 2.1165735721588135 |
| 116 | a baby like panda     | 2.1165735721588135 |
+-----+-----------------------+--------------------+
4 rows in set (0.01 sec)
Copy after login

布尔模式检索

MySQL也允许用修饰符来进行全文检索,其中特殊字符会有特殊含义:

  • +:word必须存在
  • -:word必须排除
  • (no operator):word可选,如果出现,相关性更高
  • @distance: 查询的多个单词必须在指定范围之内
  • >: 出现该单词时增加相关性
  • <:> 出现该单词时降低相关性</:>
  • ~: 出现该单词时相关性为负
  • *: 以该单词开头的单词
  • ": 表示短语
# 代表必须有a baby短语,不能有man,可以有lik开头的单词,可以有panda,
select remark from dimensionsConf where match(remark) against('+"a baby" -man lik* panda' IN BOOLEAN MODE);
Copy after login

扩展查询

当查询的关键字太短或不够清晰时,需要用隐含知识来进行检索,如database关联的MySQL/DB2等。但这个我并没太明白怎么使用,后续补充吧。

类似的使用是:

select * from articles where match(title,body) against('database' with query expansion);
Copy after login

推荐学习:MySQL教程

The above is the detailed content of MySQL InnoDB index principles and algorithms. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MySQL: An Introduction to the World's Most Popular Database MySQL: An Introduction to the World's Most Popular Database Apr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

MySQL's Place: Databases and Programming MySQL's Place: Databases and Programming Apr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

Why Use MySQL? Benefits and Advantages Why Use MySQL? Benefits and Advantages Apr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

How to connect to the database of apache How to connect to the database of apache Apr 13, 2025 pm 01:03 PM

Apache connects to a database requires the following steps: Install the database driver. Configure the web.xml file to create a connection pool. Create a JDBC data source and specify the connection settings. Use the JDBC API to access the database from Java code, including getting connections, creating statements, binding parameters, executing queries or updates, and processing results.

How to start mysql by docker How to start mysql by docker Apr 15, 2025 pm 12:09 PM

The process of starting MySQL in Docker consists of the following steps: Pull the MySQL image to create and start the container, set the root user password, and map the port verification connection Create the database and the user grants all permissions to the database

MySQL's Role: Databases in Web Applications MySQL's Role: Databases in Web Applications Apr 17, 2025 am 12:23 AM

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

How to install mysql in centos7 How to install mysql in centos7 Apr 14, 2025 pm 08:30 PM

The key to installing MySQL elegantly is to add the official MySQL repository. The specific steps are as follows: Download the MySQL official GPG key to prevent phishing attacks. Add MySQL repository file: rpm -Uvh https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm Update yum repository cache: yum update installation MySQL: yum install mysql-server startup MySQL service: systemctl start mysqld set up booting

Laravel Introduction Example Laravel Introduction Example Apr 18, 2025 pm 12:45 PM

Laravel is a PHP framework for easy building of web applications. It provides a range of powerful features including: Installation: Install the Laravel CLI globally with Composer and create applications in the project directory. Routing: Define the relationship between the URL and the handler in routes/web.php. View: Create a view in resources/views to render the application's interface. Database Integration: Provides out-of-the-box integration with databases such as MySQL and uses migration to create and modify tables. Model and Controller: The model represents the database entity and the controller processes HTTP requests.

See all articles