Completely master the indexing skills of mysql (summary sharing)-Mysql Tutorial-php.cn

This article brings you relevant knowledge about mysql indexes, including the logical structure of mysql and sql execution statements. I hope it will be helpful to you.

Completely master the indexing skills of mysql (summary sharing)

1. MySQL three-tier logical architecture

The storage engine architecture of MySQL separates query processing from data storage/retrieval. The following is the logical architecture diagram of MySQL:

1. The first layer is responsible for connection management, authorization authentication, security, etc.

Each client connection corresponds to a thread on the server. A thread pool is maintained on the server to avoid creating and destroying a thread for each connection. When a client connects to a MySQL server, the server authenticates it. Authentication can be done through username and password, or through SSL certificate. After the login authentication is passed, the server will also verify whether the client has the authority to execute a certain query.

2. The second layer is responsible for parsing the query

Compiling SQL and optimizing it (such as adjusting the reading order of the table, selecting appropriate indexes, etc.). For SELECT statements, before parsing the query, the server will first check the query cache. If the corresponding query result can be found in it, the query result will be returned directly without the need for query parsing, optimization, etc. Stored procedures, triggers, views, etc. are all implemented in this layer.

3. The third layer is the storage engine

The storage engine is responsible for storing data in MySQL, extracting data, starting a transaction, etc. The storage engine communicates with the upper layer through APIs. These APIs shield the differences between different storage engines, making these differences transparent to the upper layer query process. The storage engine will not parse SQL.

2. Comparison between InnoDB and MyISAM

1. Storage structure

MyISAM: Each MyISAM is stored in three files on the disk. They are: table definition files, data files, and index files. The name of the first file begins with the name of the table, and the extension indicates the file type. .frm files store table definitions. The data file extension is .MYD (MYData). The extension of the index file is .MYI (MYIndex).

InnoDB: All tables are stored in the same data file (or multiple files, or independent table space files). The size of the InnoDB table is only limited by the size of the operating system file. Generally 2GB.

2. Storage space

MyISAM: MyISAM supports three different storage formats: static table (default, but note that there cannot be spaces at the end of the data, it will be removed ), dynamic tables, compressed tables. After the table is created and data is imported, no modification operations will be performed. You can use compressed tables to greatly reduce disk space usage.

InnoDB: Requires more memory and storage, it will establish its own dedicated buffer pool in main memory for caching data and indexes.

3. Portability, backup and recovery

MyISAM: Data is stored in the form of files, so it is very convenient for cross-platform data transfer. You can perform operations on a table individually during backup and recovery.

InnoDB: Free solutions include copying data files, backing up binlog, or using mysqldump, which is relatively painful when the data volume reaches dozens of gigabytes.

4. Transaction support

MyISAM: The emphasis is on performance. Each query is atomic and its execution times are faster than the InnoDB type, but it does not provide transactions. support.

InnoDB: Provides transaction support, foreign keys and other advanced database functions. Transaction-safe (ACID compliant) tables with transaction (commit), rollback (rollback), and crash recovery capabilities.

5. AUTO_INCREMENT

MyISAM: You can create a joint index with other fields. The engine's automatic growth column must be an index. If it is a combined index, the automatic growth column does not need to be the first column. It can be sorted according to the previous columns and then incremented.

InnoDB: InnoDB must contain an index with only this field. The engine's auto-growing column must be an index, and if it is a composite index, it must also be the first column of the composite index.

6. Table lock differences

MyISAM: Only table-level locks are supported. When users operate myisam tables, select, update, delete, and insert statements will all be automatically assigned to the table. Locking, if the locked table satisfies insert concurrency, new data can be inserted at the end of the table.

InnoDB: Supporting transactions and row-level locks is the biggest feature of innodb. Row locks greatly improve the performance of multi-user concurrent operations. However, InnoDB's row lock is only valid on the primary key of WHERE. Any non-primary key WHERE will lock the entire table.

7. Full-text index

MyISAM: supports FULLTEXT type full-text index

InnoDB: does not support FULLTEXT type full-text index, but innodb can use it The sphinx plug-in supports full-text indexing and the effect is better.

8. Table primary key

MyISAM: Allows tables without any indexes and primary keys to exist. The indexes are the addresses where rows are saved.

InnoDB: If the primary key or non-empty unique index is not set, a 6-byte primary key (invisible to the user) will be automatically generated. The data is part of the primary index, and the additional index saves the value of the primary index.

9. The specific number of rows in the table

MyISAM: Saves the total number of rows in the table. If you select count() from table; it will be taken out directly. value.

InnoDB: The total number of rows in the table is not saved. If you use select count(*) from table; it will traverse the entire table, which consumes a lot of money. However, after adding the wehre condition, myisam and innodb process it. The way is the same.

10. CRUD operation

MyISAM: If you execute a large number of SELECTs, MyISAM is a better choice.

InnoDB: If your data performs a large number of INSERTs or UPDATEs, you should use an InnoDB table for performance reasons.

11. Foreign key

MyISAM: Not supported

InnoDB: Supported

3. Introduction to sql optimization

1. Under what circumstances should SQL optimization be performed?

Low performance, too long execution time, too long waiting time, connection query, and index failure.

2. SQL statement execution process

(1) Writing process

select distinct ... from ... join ... on ... where ... group by ... having ... order by ... limit ...

Copy after login

(2) Parsing process

from ... on ... join ... where ... group by ... having ... select distinct ... order by ... limit ...

Copy after login

3. SQL optimization is to optimize the index

The index is equivalent to the table of contents of the book.

The data structure of the index is a B-tree.

4. Index

1. Advantages of index

(1) Improve query efficiency (reduce IO usage)

(2) Reduce CPU usage

For example, when querying order by age desc, because the B index tree itself is sorted, if the index is triggered by the query, there is no need to query again.

2. Disadvantages of indexes

(1) The index itself is large and can be stored in memory or on the hard disk, usually on the hard disk.

(2) Indexes are not used in all situations, such as ① a small amount of data ② frequently changing fields ③ rarely used fields

(3) Indexes will reduce the efficiency of additions, deletions and modifications

3. Index classification

(1) Single value index

(2) Unique index

(3) Union index

(4) Primary key index

Note: The only difference between unique index and primary key index: primary key index cannot be null

4. Create index

alter table user add INDEX `user_index_username_password` (`username`,`password`)

Copy after login

5. MySQL index principle-> B tree

The underlying data structure of MySQL index is B tree

B Tree is in B- An optimization based on Tree makes it more suitable for implementing external storage index structures. The InnoDB storage engine uses B Tree to implement its index structure.

Each node in the B-Tree structure diagram contains not only the key value of the data, but also the data value. The storage space of each page is limited. If the data data is large, the number of keys that can be stored in each node (i.e. one page) will be very small. When the amount of stored data is large, it will also lead to B- The depth of Tree is larger, which increases the number of disk I/Os during query, thereby affecting query efficiency. In B Tree, all data record nodes are stored on leaf nodes of the same layer in order of key value. Only key value information is stored on non-leaf nodes. This can greatly increase the number of key values stored in each node. Reduce the height of B Tree.

B Tree has several differences compared to B-Tree:

Non-leaf nodes only store key value information.
There is a link pointer between all leaf nodes.
Data records are stored in leaf nodes.
Optimize the B-Tree in the previous section. Since the non-leaf nodes of B Tree only store key value information, assuming that each disk block can store 4 key values and pointer information, it will become the structure of B Tree. As shown in the figure below:

Usually there are two head pointers on the B Tree, one points to the root node, the other points to the leaf node with the smallest keyword, and all leaf nodes ( That is, there is a chain ring structure between data nodes). Therefore, two search operations can be performed on B Tree: one is a range search and paging search for the primary key, and the other is a random search starting from the root node.

Maybe there are only 22 data records in the above example, and the advantages of B Tree cannot be seen. Here is a calculation:

The page size in the InnoDB storage engine is 16KB, and the primary key type of the general table It is INT (occupies 4 bytes) or BIGINT (occupies 8 bytes), and the pointer type is generally 4 or 8 bytes, which means that one page (a node in B Tree) stores approximately 16KB/( 8B 8B) = 1K key values (because it is an estimate, to facilitate calculation, the value of K here is 〖10〗^3). In other words, a B Tree index with a depth of 3 can maintain 10^3 * 10^3 * 10^3 = 1 billion records.

In actual situations, each node may not be fully filled, so in the database, the height of B Tree is generally between 2 and 4 levels. MySQL's InnoDB storage engine is designed so that the root node is resident in memory, which means that only 1 to 3 disk I/O operations are needed to find the row record of a certain key value.

The B Tree index in the database can be divided into clustered index and secondary index. The above B Tree example diagram is implemented in the database as a clustered index. The leaf nodes in the B Tree of the clustered index store the row record data of the entire table. The difference between an auxiliary index and a clustered index is that the leaf nodes of the auxiliary index do not contain all the data of the row record, but the clustered index key that stores the corresponding row data, that is, the primary key. When querying data through the secondary index, the InnoDB storage engine will traverse the secondary index to find the primary key, and then find the complete row record data in the clustered index through the primary key.

5. How to trigger the joint index

1. Create a joint index on the user table username, password

2. Trigger the joint index

(1) Using all the index keys of the joint index can trigger the joint index

(2) Using all the index keys of the joint index, but connecting with or , the joint index cannot be triggered

(3) When the first field on the left of the joint index is used alone, the joint index can be triggered

(4) When using other fields of the joint index alone, the joint index cannot be triggered

6. Analyze the sql execution plan---explain

explain can simulate sql optimization and execute sql statements.

1. Introduction to using explan

(1) User table

(2) Department table

(3) Untriggered index

(4) Triggered index

(5 ) Result analysis

The table appearing in the first row of explain is the driver table.

#When the join condition is specified, the table with the few rows that satisfy the query condition is [driven table]
Not specified When joining conditions, the table with a small number of rows is [driven table]

. Sorting the driven table directly will trigger the index, while sorting the non-driven table will not trigger the index.

2. Introduction to explain query results

(1) id: SELECT identifier. This is the query sequence number of SELECT.

(2) select_type: SELECT type:

SIMPLE: Simple SELECT (does not use UNION or subquery)
PRIMARY: The outermost SELECT
UNION: The second or subsequent SELECT statement in UNION
DEPENDENT UNION: The second SELECT statement in UNION The second or subsequent SELECT statement depends on the outer query
UNION RESULT: the result of UNION
SUBQUERY: the subquery A SELECT
DEPENDENT SUBQUERY: The first SELECT in the subquery, depending on the outer query
DERIVED: SELECT of the derived table (Subquery of FROM clause)

(3) table: table name

(4) type: connection type

system: The table has only one row (=system table). This is a special case of the const join type.
const: The table has at most one matching row, which will be read at the beginning of the query. Because there is only one row, the column values in this row can be treated as constants by the rest of the optimizer. const is used when comparing all parts of a PRIMARY KEY or UNIQUE index with a constant value.
eq_ref: For each combination of rows from the previous table, read one row from this table. This is probably the best join type, besides const types. It is used when all parts of an index are used in the join and the index is UNIQUE or PRIMARY KEY. eq_ref can be used on indexed columns compared using the = operator. The comparison value can be a constant or an expression that uses a column from a table that was read before this table.
ref: For each combination of rows from the previous table, all rows with matching index values will be read from this table. Use ref if the join uses only the leftmost prefix of the key, or if the key is not UNIQUE or PRIMARY KEY (in other words, if the join cannot select a single row based on the key). This join type is good if you are using keys that only match a small number of rows. ref can be used on indexed columns using the = or <=> operators.
ref_or_null: This join type is like ref, but adds MySQL to specifically search for rows containing NULL values. This join type of optimization is often used in solving subqueries.
index_merge: This join type indicates that the index merge optimization method is used. In this case, the key column contains the list of indexes used, and key_len contains the longest key element of the index used.
unique_subquery: This type replaces the ref of the IN subquery of the following form: value IN (SELECT primary_key FROM single_table WHERE some_expr); unique_subquery is an index lookup function that can completely replace the subquery, higher efficiency.
index_subquery: This join type is similar to unique_subquery. IN subqueries can be replaced, but only for non-unique indexes in subqueries of the following form: value IN (SELECT key_column FROM single_table WHERE some_expr)
range: retrieve only the given range Rows, use an index to select rows. The key column shows which index was used. key_len contains the longest key element of the index used. The ref column is NULL in this type. When using =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN or IN operators, you can use range# when comparing key columns with constants.
##index: This join type is the same as ALL, except that only the index tree is scanned. This is usually faster than ALL because index files are usually smaller than data files.
all: Perform a complete table scan for each combination of rows from the previous table. This is usually not good if the table is the first one not marked const, and is usually bad in that case. It is usually possible to add more indexes without using ALL so that rows can be retrieved based on constant values or column values in the previous table.

(5) possible_keys: The possible_keys column indicates which index MySQL can use to find rows in the table. Note that this column is completely independent of the order of the tables shown in the EXPLAIN output. This means that some keys in possible_keys cannot actually be used in the generated table order.

(6) key: The key column displays the key (index) that MySQL actually decides to use. If no index is selected, the key is NULL. To force MySQL to use or ignore the index on the possible_keys column, use FORCE INDEX, USE INDEX, or IGNORE INDEX in the query.

(7) key_len: The key_len column displays the key length that MySQL decides to use. If the key is NULL, the length is NULL. Note that by using the key_len value we can determine which parts of a multipart keyword MySQL will actually use.

(8)ref: The ref column shows which column or constant is used with key to select rows from the table.

(9) rows: The rows column shows the number of rows that MySQL thinks it must check when executing the query.

(10)Extra: This column contains details of the query resolved by MySQL.

Distinct: After MySQL finds the first matching row, it stops searching for more rows for the current row combination.
Not exists: MySQL can perform LEFT JOIN optimization on the query. After finding a row matching the LEFT JOIN standard, it will no longer check more in the table for the previous row combinations. OK.
range checked for each record (index map: #): MySQL did not find a good index to use, but found that a partial index might be possible if the column values from the previous table were known can use. For each combination of rows from the preceding table, MySQL checks whether the rows can be retrieved using the range or index_merge access methods.
Using filesort: MySQL requires one extra pass to figure out how to retrieve the rows in sorted order. Sorting is accomplished by browsing all rows based on the join type and saving the sort key and pointer to the row for all rows matching the WHERE clause. The keys are then sorted and the rows are retrieved in sorted order.
Using index: Retrieve column information from the table by reading the actual rows using only the information in the index tree without further searching. This strategy can be used when the query uses only columns that are part of a single index.
Using temporary: In order to solve the query, MySQL needs to create a temporary table to hold the results. A typical situation is when the query contains GROUP BY and ORDER BY clauses that can list columns according to different situations.
Using where: The WHERE clause is used to limit which row matches the next table or is sent to the customer. Unless you specifically request or check all rows from the table, the query may have some errors if the Extra value is not Using where and the table join type is ALL or index.
Using sort_union(...), Using union(...), Using intersect(...): These functions illustrate how to merge index scans for the index_merge join type.
Using index for group-by: Similar to the Using index method of accessing a table, Using index for group-by means that MySQL has found an index that can be used to query GROUP BY or DISTINCT queries. of all columns without having to additionally search the hard drive to access the actual table. Also, use the index in the most efficient way so that for each group, only a few index entries are read.

You can get a hint about how a join is doing by multiplying all the values in the rows column of the EXPLAIN output. This should roughly tell you how many rows MySQL has to check in order to execute the query. This product is also used to determine which multi-table SELECT statement to execute when you use the max_join_size variable to limit a query.

Recommended learning: mysql video tutorial

The above is the detailed content of Completely master the indexing skills of mysql (summary sharing). For more information, please follow other related articles on the PHP Chinese website!