Home Database Mysql Tutorial Detailed introduction to the principle of MySQL JOIN

Detailed introduction to the principle of MySQL JOIN

Jul 20, 2017 pm 03:31 PM
join mysql principle

one. Overview of Join syntax

join is used to connect fields in multiple tables. The syntax is as follows:

... FROM table1 INNER|LEFT|RIGHT JOIN table2 ON conditiona

table1: left table; table2: right table.

JOIN is roughly divided into the following three categories according to its functions:

INNER JOIN (inner join, or equivalent join): obtains records with a connection matching relationship in two tables.

LEFT JOIN (Left Join): Get the complete records of the left table (table1), that is, there is no corresponding matching record in the right table (table2).

RIGHT JOIN (right join): Contrary to LEFT JOIN, complete records of the right table (table2) are obtained, that is, there is no matching corresponding record in the left table (table1).

Note: mysql does not support Full join, but you can use the UNION keyword to combine LEFT JOIN and RIGHT JOIN to simulate a FULL join.

First look at the two tables in the experiment:

Table comments, the total number of rows is 28856
Table comments_for, the total number of rows is 57, comments_id is indexed, and the ID column is the primary key.
The above two tables are the basis for our testing, and then look at the index. The comments_for table comments_id is indexed, and the ID is the primary key.
Recently, a developer in the company asked me about MySQL JOIN JOIN. In detail, I told you that my understanding of MySQL JOIN is not very deep, so I also checked a lot of documents, and finally posted it on the InsideMySQL official account. I saw two analysis articles about JOIN and felt they were very well written. Let me share my actual test of JOIN. Let's first introduce MySQL's JOIN algorithm, which is divided into three types (source: InsideMySQL):
MySQL only supports one JOIN algorithm, Nested-Loop Join (nested loop link), unlike other businesses The database can support hash links and merge connections, but MySQL's Nested-Loop Join (nested loop link) also has many variants, which can help MySQL perform JOIN operations more efficiently:
(1) Simple Nested- Loop Join (picture taken from InsideMySQL)
This algorithm is relatively simple. R1 is taken from the driver table to match all columns of the S table, and then R2, R3, until all the data in the R table are matched, and then the data are merged. It can be seen that this algorithm requires RN accesses to the S table. Although it is simple, the overhead is still relatively too high
(2) Index Nested-Loop Join, the implementation method is as follows:
Index nested connection Since there are indexes on the non-driven table, when comparing Instead of comparing records one by one, indexes can be used to reduce comparisons, thus speeding up queries. This is one of the main reasons why we usually require the related fields to have indexes when doing related queries.
When this algorithm performs a link query, the driver table will search based on the index of the associated field. When a matching value is found on the index, it will return to the table for query, that is, only when the index is matched. Only then will the table be returned. As for the selection of the driver table, the MySQL optimizer will generally choose the driver table with a small number of records. However, when the SQL is particularly complex, incorrect selections cannot be ruled out.
In the index nested link mode, if the associated key of the non-driven table is the primary key, the performance will be very high. If it is not the primary key, the number of rows returned will be large if the association is , the efficiency will be particularly low because multiple table return operations are required. First associate the index, and then perform the table return operation based on the primary key ID of the secondary index. In this case, the performance will be relatively poor.
(3) Block Nested-Loop Join, implemented as follows:
When there is an index, MySQL will try to use Index Nested -Loop Join algorithm. In some cases, the Join column may not have an index. In this case, MySQL's choice will definitely not be the Simple Nested-Loop Join algorithm introduced first, but will give priority to Block Nested-Loop Join. algorithm.
Compared with Simple Nested-Loop Join, Block Nested-Loop Join has an additional intermediate processing process, which is the join buffer. Use the join buffer to buffer all the query JOIN related columns of the driver table into the JOIN BUFFER, and then Compare batches with non-driven tables. If this is also implemented, multiple comparisons can be merged into one, reducing the access frequency of non-driven tables. That is, the S table only needs to be accessed once. In this way, the non-driven table will not be accessed multiple times, and only in this case will the join buffer be accessed.
In MySQL, we can set the value of the join buffer through the parameter join_buffer_size, and then perform the operation. By default join_buffer_size=256K, MySQL will cache all required columns into the join buffer during search, including the selected columns, instead of just caching the associated columns. In a SQL with N JOIN associations, N-1 join buffers will be allocated during execution.
The above introduction is over, let’s take a look at the specific examples
(1) Full table JOIN
EXPLAIN SELECT * FROM comments gc
JOIN comments_for gcf ON gc.comments_id=gcf.comments_id;
Copy after login

Look at the output information:
You can see that comments_for is used during the full table scan. In the driver table, because the related fields are indexed, a full index scan is performed on the index idx_commentsid to match the non-driven table comments, and one row can be matched each time. At this time, Index Nested-Loop Join is used, and the entire table is matched through the index. We can see that because the magnitude of the comments_for table is much smaller than comments, MySQL gives priority to the small table comments_for as the driving table.
(2) Full table JOIN + filter conditions
SELECT * FROM comments gc
JOIN comments_for gcf ON gc.comments_id=gcf.comments_id
WHERE gc.comments_id =2056
Copy after login

At this time, Index Nested-Loop Join is used. First, filter the primary key of the driver table comments to match one, and then search and match the index idx_commentsid of the non-driven table comments_for. The final matching result is expected to affect one, so that only the non-driver table comments_for is matched. The idx_commentsid index of the driver table performs an access operation, and the efficiency is relatively high.
(3) Take a look at the situation where the associated field has no index:
EXPLAIN SELECT * FROM comments gc
JOIN comments_for gcf ON gc.order_id=gcf.product_id
Copy after login

Let’s take a look at the execution plan:
We can see from the execution plan that this table JOIN uses Block Nested-Loop Join to perform table association. First, comments_for ( Only 57 rows) This small table is used as the driver table, and then the required data of comments_for is cached in the JOIN buffer, and the comments table is scanned in batches, that is, only one match is performed. The premise is that the join buffer is large enough to store the comments_for cache. data.
And we see a very clear prompt in the execution plan: Using where; Using join buffer (Block Nested Loop)
Generally, if this happens, it proves that our SQL needs to be optimized. .
It should be noted that in this case, MySQL will also choose the violent method of Simple Nested-Loop Join. I haven’t understood how it chooses this optimizer, but it usually uses Block Nested. -Loop Join, because CBO is based on overhead, the performance of Block Nested-Loop Join is much better than that of Simple Nested-Loop Join.
(4) Take a look at left join
EXPLAIN SELECT * FROM comments gc
LEFT JOIN comments_for gcf ON gc.comments_id=gcf.comments_id
Copy after login

Look at the execution plan:
In this case, since our related fields are indexed, we call Index Nested-Loop Join, but when there are no filter conditions, the first table will be selected as the driving table. To perform JOIN, to associate the index of the non-driven table to perform Index Nested-Loop Join.
If you add the filter condition gc.comments_id =2056, this will filter out an Index Nested-Loop Join for the non-driven table, which is very efficient.
If it is the following:
EXPLAIN SELECT * FROM comments_for gcf
LEFT JOIN comments gc ON gc.comments_id=gcf.comments_id
WHERE gcf.comments_id =2056
Copy after login

If you filter through the gcf table, the gcf table will be selected by default As a driver table, because it has obviously been filtered, there will be very few matching conditions. For details, you can look at the execution plan:
At this point, the join is basically very clear, to be continued , everyone is welcome to point out errors and I will seriously correct them. . . .

The above is the detailed content of Detailed introduction to the principle of MySQL JOIN. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MySQL: Simple Concepts for Easy Learning MySQL: Simple Concepts for Easy Learning Apr 10, 2025 am 09:29 AM

MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

How to open phpmyadmin How to open phpmyadmin Apr 10, 2025 pm 10:51 PM

You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

MySQL: An Introduction to the World's Most Popular Database MySQL: An Introduction to the World's Most Popular Database Apr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

Why Use MySQL? Benefits and Advantages Why Use MySQL? Benefits and Advantages Apr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

How to use single threaded redis How to use single threaded redis Apr 10, 2025 pm 07:12 PM

Redis uses a single threaded architecture to provide high performance, simplicity, and consistency. It utilizes I/O multiplexing, event loops, non-blocking I/O, and shared memory to improve concurrency, but with limitations of concurrency limitations, single point of failure, and unsuitable for write-intensive workloads.

MySQL and SQL: Essential Skills for Developers MySQL and SQL: Essential Skills for Developers Apr 10, 2025 am 09:30 AM

MySQL and SQL are essential skills for developers. 1.MySQL is an open source relational database management system, and SQL is the standard language used to manage and operate databases. 2.MySQL supports multiple storage engines through efficient data storage and retrieval functions, and SQL completes complex data operations through simple statements. 3. Examples of usage include basic queries and advanced queries, such as filtering and sorting by condition. 4. Common errors include syntax errors and performance issues, which can be optimized by checking SQL statements and using EXPLAIN commands. 5. Performance optimization techniques include using indexes, avoiding full table scanning, optimizing JOIN operations and improving code readability.

MySQL's Place: Databases and Programming MySQL's Place: Databases and Programming Apr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

Monitor Redis Droplet with Redis Exporter Service Monitor Redis Droplet with Redis Exporter Service Apr 10, 2025 pm 01:36 PM

Effective monitoring of Redis databases is critical to maintaining optimal performance, identifying potential bottlenecks, and ensuring overall system reliability. Redis Exporter Service is a powerful utility designed to monitor Redis databases using Prometheus. This tutorial will guide you through the complete setup and configuration of Redis Exporter Service, ensuring you seamlessly build monitoring solutions. By studying this tutorial, you will achieve fully operational monitoring settings

See all articles