How to create high-performance indexes for MySQL
1 Index Basics
1.1 Index Function
In MySQL, when searching for data, first find the corresponding value in the index, and then Find the corresponding data row according to the matching index record. If you want to run the following query statement:
SELECT * FROM USER WHERE uid = 5;
If there is an index built on uid, MySQL will use the index to first find the row with uid 5, that is to say MySQL first searches by value on the index and then returns all rows containing that value.
1.2 Commonly used data structures for MySQL indexes
MySQL indexes are implemented at the storage engine level, not on the server. Therefore, there is no unified indexing standard: indexes in different storage engines work differently.
1.2.1 B-Tree
Most MySQL engines support this kind of index B-Tree. Even if multiple storage engines support the same type of index, their underlying implementation may be different. . For example, InnoDB uses B Tree.
Storage engines implement B-Tree in different ways, with different performances and advantages. For example, MyISAM uses prefix compression technology to make indexes smaller, while InnoDB stores the data according to the original data format. The MyISAM index refers to the indexed rows by the physical location of the data, while InnoDB applies the indexed rows according to the component.
All B-Tree values are stored sequentially, and the distance from each leaf page to the root is the same. The following figure roughly reflects how the InnoDB index works. The structure used by MyISAM is different. But the basic implementation is similar.
Example diagram description:
Each node occupies one disk block, and there are two ascending sorting keys and three pointing subtrees on one node. The pointer of the root node, which stores the address of the disk block where the child node is located. The three range fields divided by the two keywords correspond to the range fields of the data of the subtree pointed to by the three pointers. Taking the root node as an example, the keywords are 16 and 34, the data range of the subtree pointed by the P1 pointer is less than 16, the data range of the subtree pointed by the P2 pointer is 16~34, and the data range of the subtree pointed by the P3 pointer is greater than 34. Keyword search process:
Find disk block 1 based on the root node and read it into memory. [Disk I/O operation 1st time]
Compare keyword 28 In the interval (16,34), find the pointer P2 of disk block 1.
Find disk block 3 based on the P2 pointer and read it into memory. [Disk I/O operation 2nd time]
Compare keyword 28 In the interval (25,31), find the pointer P2 of disk block 3.
Find disk block 8 based on the P2 pointer and read it into memory. [Disk I/O operation 3rd]
Keyword 28 was found in the keyword list in disk block 8.
Disadvantages:
Each node has a key and also contains data, and each page has storage space It is limited. If the data is relatively large, the number of keys stored in each node will become smaller;
When the amount of stored data is large, the depth will be larger and increase. The number of disk IO times during querying will affect query performance.
1.2.2 B Tree Index
B tree is a variant of B tree. Difference from B-tree: B-tree only stores data in leaf nodes, and non-leaf nodes only store key values and pointers.
There are two pointers on the B tree, one points to the root leaf node and the other points to the leaf node with the smallest key, and there is a chain ring structure between all leaf nodes (i.e. data nodes). Therefore, two search operations can be performed on the B-tree: one is a range search for components, and the other is a random search starting from the root node.
B* tree is similar to B number. The difference is that B* number also has a chain ring structure between non-leaf nodes.
1.2.3 Hash Index
Hash index is based on the hash table. Only queries that accurately match all columns of the index are valid. For each row of data, the storage engine will calculate a hash code for all index columns. The hash code is a smaller value, and the hash codes calculated for rows with different key values are also different. A hash index stores all the hash codes in the index and a pointer to each data row in the hash table.
In MySQL, only the default index type of Memory is the hash index used, and memory also supports B-Tree indexes. At the same time, the Memory engine supports non-unique hash indexes. If the hash values of multiple columns are the same, the index will store multiple pointers in the same hash entry in a linked list. Similar to HashMap.
Advantages:
The index itself only needs to store the corresponding hash value, so the structure of the index is very compact, and hashing speeds up searches very fast.
Disadvantages:
If you use hash storage, you need to add all data files to the memory, which consumes more memory space;
Hash index data is not stored in order, so it cannot be used for sorting;
If all queries are equivalent queries, then hashing is indeed very fast, but in an enterprise or actual working environment, more data is searched in ranges rather than equivalent queries, so hashing is not So suitable;
If there are many hash conflicts, the cost of index maintenance operations will also be very high. This is also the problem of Hash conflicts solved by adding red-black trees in the later stage of HashMap;
2 High-performance index strategy
2.1 Clustered index and non-clustered index
Clustered index
is not a separate index type , but a data storage method. In the InnoDB storage engine, the clustered index actually stores key values and data rows in the same structure. When a table has a clustered index, its data rows are actually stored in the leaf pages of the index. Because data rows cannot be stored in different places at the same time, there can only be one clustered index in a table (index coverage can simulate the situation of multiple clustered indexes).
Advantages of clustered index:
Can save related data together; data access is faster because the index and data are saved in the same tree; Queries using covering index scans can directly use the primary key value in the page node;
Disadvantages:
Clustered data maximizes the performance of IO-intensive applications. If the data is all in memory, then the clustered index has no advantage; the insertion speed depends heavily on the insertion order, and inserting in the order of the primary key is the fastest way; updating the clustered index column is very expensive, because each updated row will be forced to be Move to a new location; a table based on a clustered index may face page splitting when new rows are inserted or the primary key is updated and rows need to be moved; clustered indexes may cause full table scans to slow down, especially row comparisons Sparse, or when data storage is discontinuous due to page splits;
Non-clustered index
Data files and index files are stored separately
2.2 Prefix index
Sometimes it is necessary to index very long strings, which will make the index large and slow. Usually, you can use part of the string at the beginning of a column, which greatly saves index space and improves index efficiency, but this will Reduce index selectivity. Index selectivity refers to the ratio of unique index values (also called cardinality) to the total number of data table records, ranging from 1/#T to 1. The higher the selectivity of the index, the higher the query efficiency, because a more selective index allows MySQL to filter out more rows when searching.
Generally, the selectivity of a certain column prefix is high enough to meet the query performance. However, for columns of BLOB, TEXT, and VARCHAR types, prefix indexes must be used because MySQL does not allow indexing of these. The full length of the column. The trick with this method is to choose a prefix that is long enough to ensure high selectivity, but not too long.
Example
Table structure and data download from MySQL official website or GitHub.
city Table Columns
Field name | Meaning |
---|---|
city_id | City primary key ID |
city | City name |
Country ID | |
Creation or last update time |
参数 | 说明 |
---|---|
Handler_read_first | 读取索引第一个条目的次数 |
Handler_read_key | 通过index获取数据的次数 |
Handler_read_last | 读取索引最后一个条目的次数 |
Handler_read_next | 通过索引读取下一条数据的次数 |
Handler_read_prev | 通过索引读取上一条数据的次数 |
Handler_read_rnd | 从固定位置读取数据的次数 |
Handler_read_rnd_next | 从数据节点读取下一条数据的次数 |
The above is the detailed content of How to create high-performance indexes for MySQL. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics





You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

Redis uses a single threaded architecture to provide high performance, simplicity, and consistency. It utilizes I/O multiplexing, event loops, non-blocking I/O, and shared memory to improve concurrency, but with limitations of concurrency limitations, single point of failure, and unsuitable for write-intensive workloads.

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

Apache connects to a database requires the following steps: Install the database driver. Configure the web.xml file to create a connection pool. Create a JDBC data source and specify the connection settings. Use the JDBC API to access the database from Java code, including getting connections, creating statements, binding parameters, executing queries or updates, and processing results.

Effective monitoring of Redis databases is critical to maintaining optimal performance, identifying potential bottlenecks, and ensuring overall system reliability. Redis Exporter Service is a powerful utility designed to monitor Redis databases using Prometheus. This tutorial will guide you through the complete setup and configuration of Redis Exporter Service, ensuring you seamlessly build monitoring solutions. By studying this tutorial, you will achieve fully operational monitoring settings

The process of starting MySQL in Docker consists of the following steps: Pull the MySQL image to create and start the container, set the root user password, and map the port verification connection Create the database and the user grants all permissions to the database
