Let’s take a look at how mysql works when performing a join operation. What are the common join methods?
As shown in the figure, when we perform a connection operation, the table on the left is Driver table, and the table on the right is Driven table
Simple Nested-Loop Join This join operation is to take a record from the driving table and then match the records of the driven table one by one. If the conditions match, the result will be returned. Then, continue to match the next record in the driver table until all the data in the driver table has been matched
Because it is time-consuming to fetch data from the driver table each time, MySQL does not use this algorithm To perform the join operation
In order to avoid the time-consuming fetching data from the driver table each time, we can add a batch of The data is retrieved from the driver table at once and matched in memory. After this batch of data is matched, a batch of data is fetched from the driver table and placed in the memory until all the data in the driver table is matched.
Batch data retrieval can reduce a lot of IO operations, so the execution efficiency is relatively high. This kind of connection operation is also used by MySQL
By the way, this memory has a proper name in MySQ, called join buffer. We can execute the following statement to view the size of the join buffer
show variables like '%join_buffer%'
Move out the single_table table we used before, create 2 tables based on the single_table table, and insert 1w random records into each table
CREATE TABLE single_table ( id INT NOT NULL AUTO_INCREMENT, key1 VARCHAR(100), key2 INT, key3 VARCHAR(100), key_part1 VARCHAR(100), key_part2 VARCHAR(100), key_part3 VARCHAR(100), common_field VARCHAR(100), PRIMARY KEY (id), KEY idx_key1 (key1), UNIQUE KEY idx_key2 (key2), KEY idx_key3 (key3), KEY idx_key_part(key_part1, key_part2, key_part3) ) Engine=InnoDB CHARSET=utf8; create table t1 like single_table; create table t2 like single_table;
If you use the join statement directly, MySQL optimization The server may select table t1 or t2 as the driving table, which will affect our process of analyzing sql statements, so we use straight_join to let mysql use a fixed connection method to execute the query
select * from t1 straight_join t2 on (t1.common_field = t2.common_field)
The running time is 0.035s
The execution plan is as follows
Using join buffer is seen in the Extra column, indicating that the connection operation is based on Block Nested -Loop Join Algorithm
After understanding the Block Nested-Loop Join algorithm, you can see that each record in the driver table will Matching all the records in the driven table is very time-consuming. Can the efficiency of matching in the driven table be improved?
I guess you have also thought of this algorithm, which is to add indexes to the columns connected by the driven table, so that the matching process is very fast, as shown in the figure
Let’s take a look at how fast it is to perform queries based on joins based on index columns?
select * from t1 straight_join t2 on (t1.id = t2.id)
The execution time is 0.001 seconds. It can be seen that it is more than one level faster than connecting based on ordinary columns.
The execution plan is as follows
Not all columns of the driver table records will be put into the join buffer. Only the columns in the query list and the columns in the filter conditions will be put into the join buffer. Therefore We don't want to use * as the query list, we just need to put the columns we care about in the query list, so that more records can be placed in the join buffer
Now that we know the specific implementation of join, let’s talk about a common question, that is, how to choose the driver table?
If it is Block Nested-Loop Join algorithm:
When the join buffer is large enough, it does not matter who does the driving table
When the join buffer is not large enough, a small table should be selected as the driving table (the small table has less data and the number of times it is put into the join buffer is small, which reduces the number of scans of the table)
If it is Index Nested-Loop Join algorithm
Assume that the number of rows in the driver table is M, so M rows of the driver table need to be scanned
Every time from When obtaining a row of data from the driven table, you need to first search for index a, and then search for the primary key index. The number of rows in the driven table is N. The approximate complexity of searching a tree each time is the logarithm of base 2 N, so the time complexity of searching a row on the driven table is 2∗ l o g 2 N 2*log2^N 2∗log2N
Each row of data in the driver table must be searched once in the driven table. The approximate complexity of the entire execution process is M M ∗ 2 ∗ l o g 2 N M M*2*log2^N M M∗2∗log2N
Obviously M has a greater impact on the number of scanned rows, so the small table should be used as the driving table. Of course, the premise of this conclusion is that the index of the driven table can be used
In short, we can just let the small table be the driving table
When the join statement is executed slowly, we can optimize it through the following methods
When performing a connection operation, the index of the driven table can be used
The small table is used as the driving table
Increase the size of the join buffer
Do not use * as the query list, only return the required columns
The above is the detailed content of How to optimize the join statement in MySQL. For more information, please follow other related articles on the PHP Chinese website!