Briefly talk about join query in MySQL
This article brings you relevant knowledge about mysql, which mainly introduces related issues about join query. Let’s take a look at it together. I hope it will be helpful to everyone.
Recommended learning: mysql video tutorial
The impact of index on join query
Data preparation
Assume that there are two tables t1 and t2. Both tables have primary key index id and index fields a and b fields without indexes. Then insert 100 rows of data into the t1 table and into the t2 table. Insert 1000 rows of data for experiment
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Read a row of data r from table t1
Get field a from data r to table t2 for matching
Take out the rows that meet the conditions in the t2 table, and form a row with r as part of the result set
Repeat steps 1-3 until table t1 loops data
This process is called Index Nested-Loop Join. In this process, the driver table t1 performs a full table scan. Because we inserted 100 rows of data into the t1 table, this scan row The number is 100, and when performing a join query, each row of the t1 table needs to be searched in the t2 table. An index tree search is used. Because the data we construct are in one-to-one correspondence, each search only scans One row, that is, the t2 table also scans a total of 100 rows. The total number of rows scanned during the entire query process is 100 100=200 rows.
No index query process
1 |
|
It can be seen that since there is no index on field B of t2 table, according to the above SQL During execution, a full table scan must be performed every time t1 matches t2. In this way, t2 must be scanned up to 100 times, and the total number of scans is 100*1000 = 100,000 rows.
Of course, this query result is still based on the fact that the two tables we built are small. If it is a table with 100,000 rows, 10 billion rows will need to be scanned. This is terrible!
2. UnderstandBlock Nested-Loop Join
##Block Nested-Loop JoinQuery process
So there is no index on the driven table, how did all this happen? In fact, when there is no available index on the driven table, the algorithm process is as follows:- Read the data of t1 into the thread memory join_buffer, because we wrote above is select * from, so it is equivalent to putting the entire t1 table into the memory;
- The process of scanning t2 is actually taking out each row of t2 and matching it with the rows in join_buffer The data is compared and those that meet the join conditions are returned as part of the result set.
Join_buffer
As we know from the above, without an index, MySQL reads the data into the memory for loop judgment, so this memory is definitely not unlimited. You use, then we need to use a parameter join_buffer_size, the value default size is 256k, as shown below:1 |
|
扫描表 t1,顺序读取数据行放入 join_buffer 中,直至加载完第 80 行满了
扫描表 t2,把 t2 表中的每一行取出来跟 join_buffer 中的数据做对比,将满足条件的数据作为结果集的一部分返回
清空 join_buffer
继续扫描表 t1,顺序读取剩余的数据行放入 join_buffer 中,执行步骤 2
这个流程体现了算法名称中 Block 的由来,分块 join,可以看出虽然查询过程中 t1 被分成了两次放入 join_buffer 中,导致 t2 表被扫描了 2次,但是判断等值条件的次数还是不变的,依然是(80+20)*1000=10 万次。
所以这就是有时候 join 查询很慢,有些大佬会让你把 join_buffer_size 调大的原因。
如何正确的写出 join 查询
驱动表的选择
有索引的情况下
在这个 join 语句执行过程中,驱动表是走全表扫描,而被驱动表是走树搜索。
假设被驱动表的行数是 M,每次在被驱动表查询一行数据,先要走索引 a,再搜索主键索引。每次搜索一棵树近似复杂度是以 2为底的 M的对数,记为 log2M,所以在被驱动表上查询一行数据的时间复杂度是 2*log2M。
假设驱动表的行数是 N,执行过程就要扫描驱动表 N 行,然后对于每一行,到被驱动表上 匹配一次。因此整个执行过程,近似复杂度是 N + N2log2M。显然,N 对扫描行数的影响更大,因此应该让小表来做驱动表。
那没有索引的情况
上述我知道了,因为 join_buffer 因为存在限制,所以查询的过程可能存在多次加载 join_buffer,但是判断的次数都是 10 万次,这种情况下应该怎么选择?
假设,驱动表的数据行数是 N,需要分 K 段才能完成算法流程,被驱动表的数据行数是 M。这里的 K不是常数,N 越大 K就越大,因此把 K 表示为λ*N,显然λ的取值范围 是 (0,1)。
扫描的行数就变成了 N+λNM,显然内存的判断次数是不受哪个表作为驱动表而影响的,而考虑到扫描行数,在 M和 N大小确定的情况下,N 小一些,整个算是的结果会更小,所以应该让小表作为驱动表
总结:真相大白了,不管是有索引还是无索引参与 join 查询的情况下都应该是使用小表作为驱动表。
什么是小表
还是以上面表 t1 和表 t2 为例子:
1 |
|
The above is the detailed content of Briefly talk about join query in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

Create a database using Navicat Premium: Connect to the database server and enter the connection parameters. Right-click on the server and select Create Database. Enter the name of the new database and the specified character set and collation. Connect to the new database and create the table in the Object Browser. Right-click on the table and select Insert Data to insert the data.

MySQL and SQL are essential skills for developers. 1.MySQL is an open source relational database management system, and SQL is the standard language used to manage and operate databases. 2.MySQL supports multiple storage engines through efficient data storage and retrieval functions, and SQL completes complex data operations through simple statements. 3. Examples of usage include basic queries and advanced queries, such as filtering and sorting by condition. 4. Common errors include syntax errors and performance issues, which can be optimized by checking SQL statements and using EXPLAIN commands. 5. Performance optimization techniques include using indexes, avoiding full table scanning, optimizing JOIN operations and improving code readability.

You can create a new MySQL connection in Navicat by following the steps: Open the application and select New Connection (Ctrl N). Select "MySQL" as the connection type. Enter the hostname/IP address, port, username, and password. (Optional) Configure advanced options. Save the connection and enter the connection name.

Recovering deleted rows directly from the database is usually impossible unless there is a backup or transaction rollback mechanism. Key point: Transaction rollback: Execute ROLLBACK before the transaction is committed to recover data. Backup: Regular backup of the database can be used to quickly restore data. Database snapshot: You can create a read-only copy of the database and restore the data after the data is deleted accidentally. Use DELETE statement with caution: Check the conditions carefully to avoid accidentally deleting data. Use the WHERE clause: explicitly specify the data to be deleted. Use the test environment: Test before performing a DELETE operation.

Redis uses a single threaded architecture to provide high performance, simplicity, and consistency. It utilizes I/O multiplexing, event loops, non-blocking I/O, and shared memory to improve concurrency, but with limitations of concurrency limitations, single point of failure, and unsuitable for write-intensive workloads.

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.
