Analyze and optimize Mysql multi-table joint query efficiency-Mysql Tutorial-php.cn

Mysql big data query optimization is not analyzed carefully by many webmasters. For this problem, the editor recently encountered a 100W data optimization problem. Here are some mysql correlations I hope the query optimization test and related analysis can help everyone.

Recommended related mysql video tutorials: "mysql tutorial"

1, an optimization of simple correlated subqueries.

Many times, the performance of subqueries implemented on mysql is poor, which sounds really sad. Especially sometimes, when using the IN() subquery statement, for tables of a certain order of magnitude, it takes too much time to estimate. My mysql knowledge is not deep, so I can only slowly figure out the mystery.
Suppose there is such an exists query statement:

 select * from table1 
  where exists
      (select * from table2 where id>=30000 and table1.uuid=table2.uuid);

Copy after login

table1 is a table with one hundred thousand rows, table2 is a table with one million rows, and the local test result takes 2.40s.

You can see from explain that the subquery is a related subquery (DEPENDENCE SUBQUERY); Mysql will first perform a full table scan on the outer table1, and then execute the subquery one by one based on the returned uuid. If the outer table is a large table, we can imagine that query performance will be worse than in this test.

A simple optimization solution is to use the inner join method to replace the subquery. The query statement can be changed to:

 select * from table1 innner join table2 using(uuid) where table2.id>=30000;

Copy after login

The local test result took 0.68s.

Through explain, you can see that mysql uses the SIMPLE type (query method other than subquery or union); Mysql optimizer will first filter table2, and then do the Cartesian product of table1 and table2 to obtain the result set. Then filter the data through on conditions.

2. Multi-table joint query efficiency analysis and optimization
1. Multi-table connection type
1. Cartesian product (cross connection) In MySQL, it can be CROSS JOIN or omit CROSS, which is JOIN. Or use ',' such as:

  01.SELECT * FROM table1 CROSS JOIN table2   
  02.SELECT * FROM table1 JOIN table2   
  03.SELECT * FROM table1,table2  
  SELECT * FROM table1 CROSS JOIN table2 
  SELECT * FROM table1 JOIN table2 
  SELECT * FROM table1,table2

Copy after login

Since the result returned is the product of the two connected data tables, it is generally not recommended to use it when there are WHERE, ON or USING conditions, because when the data table When there are too many projects, it will be very slow. Generally use LEFT [OUTER] JOIN or RIGHT [OUTER] JOIN

2. INNER JOIN INNER JOIN is called an equijoin in MySQL, that is, you need to specify the equijoin conditions in CROSS in MySQL and INNER JOIN are divided together. join_table: table_reference [INNER | CROSS] JOIN table_factor [join_condition]

3. Outer joins in MySQL are divided into left outer joins and right joins, that is, in addition to returning results that meet the join conditions, For results that do not meet the join conditions in the left table (left join) or right table (right join), NULL is used accordingly.

Example:

user table:

id | name
  ———
  1 | libk
  2 | zyfon
  3 | daodao

Copy after login

user_action table:

user_id | action
  —————
  1 | jump
  1 | kick
  1 | jump
  2 | run
  4 | swim

Copy after login

sql:

  01.select id, name, action from user as u  
  02.left join user_action a on u.id = a.user_id  
  select id, name, action from user as u
  left join user_action a on u.id = a.user_idresult:
  id | name    | action
  ——————————–
  1  | libk         | jump           ①
  1  | libk         | kick             ②
  1  | libk         | jump           ③
  2  | zyfon      | run               ④
  3  | daodao | null              ⑤

Copy after login

Analysis:
Note There is also a record of user_id=4, action=swim in user_action, but it does not appear in the results.
The user with id=3, name=daodao in the user table does not have a corresponding record in user_action, but it does appear. in the result set
Because it is a left join now, all work is based on left.
The results 1, 2, 3, and 4 are all records in both the left table and the right table, and 5 is only in the left table table, records not in the right table

Working principle:

Read one record from the left table, select all the right table records (n) that match on and connect them to form n records (Including duplicate rows, such as result 1 and result 3), if there is no table on the right that matches the on condition, then the connected fields are null. Then continue reading the next one.

Extension:
We can use the rule that if there is no on match in the right table, null will be displayed to find all the records that are in the left table but not in the right table. Note that the column used for judgment must be declared as not null.
Such as:
sql:

  01.select id, name, action from user as u  
  02.left join user_action a on u.id = a.user_id  
  03.where a.user_id is NULL  
  select id, name, action from user as u
  left join user_action a on u.id = a.user_id
  where a.user_id is NULL

Copy after login

(Note:

1. If the column value is null, you should use is null instead of =NULL
2. Here The a.user_id column must be declared NOT NULL.)
The result of the above sql:

 id | name | action  
 ————————–  
 3 | daodao | NULL
——————————————————————————–

Copy after login

General usage:

a. LEFT [OUTER] JOIN:

In addition to returning results that meet the connection conditions, you also need to display the data columns in the left table that do not meet the connection conditions. Correspondingly, use NULL to correspond.

  01.SELECT column_name FROM table1 LEFT [OUTER] JOIN table2 ON table1.column=table2.column  
   SELECT column_name FROM table1 LEFT [OUTER] JOIN table2 ON table1.column=table2.column
  b. RIGHT [OUTER] JOIN：

Copy after login

RIGHT is similar to LEFT JOIN except that it displays In addition to the results that meet the connection conditions, it is also necessary to display the data columns in the right table that do not meet the connection conditions. Use NULL correspondingly

 01.SELECT column_name FROM table1 RIGHT [OUTER] JOIN table2 ON table1.column=table2.column  
   SELECT column_name FROM table1 RIGHT [OUTER] JOIN table2 ON table1.column=table2.columnTips:

Copy after login

1. on a.c1 = b.c1 is equivalent to using(c1)
2. INNER JOIN and, (comma) are semantically equivalent
3. When MySQL retrieves information from a table, you can prompt it to choose which index.
This feature is useful if EXPLAIN shows that MySQL is using the wrong index from the list of possible indexes.
By specifying USE INDEX (key_list), you can tell MySQL to use the most appropriate of the possible indexes to find rows in the table.
The optional second-choice syntax IGNORE INDEX (key_list) can be used to tell MySQL not to use a specific index. For example:

  01.mysql> SELECT * FROM table1 USE INDEX (key1,key2)  
  02.-> WHERE key1=1 AND key2=2 AND key3=3;  
  03.mysql> SELECT * FROM table1 IGNORE INDEX (key3)  
  04.-> WHERE key1=1 AND key2=2 AND key3=3;  
  mysql> SELECT * FROM table1 USE INDEX (key1,key2)
  -> WHERE key1=1 AND key2=2 AND key3=3;
  mysql> SELECT * FROM table1 IGNORE INDEX (key3)
  -> WHERE key1=1 AND key2=2 AND key3=3;

Copy after login

2. Constraints on table connections
Add display conditions WHERE, ON, USING

1.WHERE clause mysql>

  01.SELECT * FROM table1,table2 WHERE table1.id=table2.id;  
  SELECT * FROM table1,table2 WHERE table1.id=table2.id;

Copy after login

2. ON

mysql>

  01.SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id;    02.    03.SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id    04.LEFT JOIN table3 ON table2.id=table3.id;    SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id;
SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id  LEFT JOIN table3 ON table2.id=table3.id;

Copy after login

3. USING子句，如果连接的两个表连接条件的两个列具有相同的名字的话可以使用USING

例如：

SELECT FROM LEFT JOIN USING ()

连接多于两个表的情况举例：

mysql>

  01.SELECT artists.Artist, cds.title, genres.genre     02.    03.FROM cds     04.    05.LEFT JOIN genres N cds.genreID = genres.genreID     06.    07.LEFT JOIN artists ON cds.artistID = artists.artistID;     SELECT artists.Artist, cds.title, genres.genre
FROM cds
LEFT JOIN genres N cds.genreID = genres.genreID
LEFT JOIN artists ON cds.artistID = artists.artistID;

Copy after login

或者 mysql>

  01.SELECT artists.Artist, cds.title, genres.genre   
  02.  
  03.FROM cds   
  04.  
  05.LEFT JOIN genres ON cds.genreID = genres.genreID   
  06.  
  07. LEFT JOIN artists -> ON cds.artistID = artists.artistID  
  08.  
  09. WHERE (genres.genre = &#39;Pop&#39;);   
  SELECT artists.Artist, cds.title, genres.genre

Copy after login

FROM cds

LEFT JOIN genres ON cds.genreID = genres.genreID
 LEFT JOIN artists -> ON cds.artistID = artists.artistID
 WHERE (genres.genre = &#39;Pop&#39;);

Copy after login

--------------------------------------------

另外需要注意的地方在MySQL中涉及到多表查询的时候，需要根据查询的情况，想好使用哪种连接方式效率更高。

1. 交叉连接(笛卡尔积)或者内连接 [INNER | CROSS] JOIN

2. 左外连接LEFT [OUTER] JOIN或者右外连接RIGHT [OUTER] JOIN 注意指定连接条件WHERE, ON，USING.

3. MySQL如何优化LEFT JOIN和RIGHT JOIN
在MySQL中，A LEFT JOIN B join_condition执行过程如下：

1)· 根据表A和A依赖的所有表设置表B。

2)· 根据LEFT JOIN条件中使用的所有表(除了B)设置表A。

3)· LEFT JOIN条件用于确定如何从表B搜索行。(换句话说，不使用WHERE子句中的任何条件）。

4)· 可以对所有标准联接进行优化，只是只有从它所依赖的所有表读取的表例外。如果出现循环依赖关系，MySQL提示出现一个错误。

5)· 进行所有标准WHERE优化。

6)· 如果A中有一行匹配WHERE子句，但B中没有一行匹配ON条件，则生成另一个B行，其中所有列设置为NULL。

7)· 如果使用LEFT JOIN找出在某些表中不存在的行，并且进行了下面的测试：WHERE部分的col_name IS NULL，其中col_name是一个声明为 NOT NULL的列，MySQL找到匹配LEFT JOIN条件的一个行后停止(为具体的关键字组合)搜索其它行。

RIGHT JOIN的执行类似LEFT JOIN，只是表的角色反过来。

联接优化器计算表应联接的顺序。LEFT JOIN和STRAIGHT_JOIN强制的表读顺序可以帮助联接优化器更快地工作，因为检查的表交换更少。请注意这说明如果执行下面类型的查询，MySQL进行全扫描b，因为LEFT JOIN强制它在d之前读取：

 01.SELECT *  
  02.FROM a,b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key)  
  03.WHERE b.key=d.key;  
  SELECT *
  FROM a,b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key)
  WHERE b.key=d.key;

Copy after login

在这种情况下修复时用a的相反顺序，b列于FROM子句中：

  01.SELECT *  
  02.FROM b,a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key)  
  03.WHERE b.key=d.key;  
  SELECT *
  FROM b,a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key)
  WHERE b.key=d.key;

Copy after login

MySQL可以进行下面的LEFT JOIN优化：如果对于产生的NULL行，WHERE条件总为假，LEFT JOIN变为普通联接。

例如，在下面的查询中如果t2.column1为NULL，WHERE 子句将为false：

01.SELECT * FROM t1 LEFT JOIN t2 ON (column1) WHERE t2.column2=5;
SELECT * FROM t1 LEFT JOIN t2 ON (column1) WHERE t2.column2=5;因此，可以安全地将查询转换为普通联接：

01.SELECT * FROM t1, t2 WHERE t2.column2=5 AND t1.column1=t2.column1;
SELECT * FROM t1, t2 WHERE t2.column2=5 AND t1.column1=t2.column1;这样可以更快，因为如果可以使查询更佳，MySQL可以在表t1之前使用表t2。为了强制使用表顺序，使用STRAIGHT_JOIN。

三、利用缓存来实现

现在社区分享类网站很火，就拿方维购物分享网站举例说明吧。也是对二次开发方维购物分享网站的一点总结，高手可以飞过。

购物分享的关键表有：分享表、图片表、文件表、评论表、标签表、分类表等。
围绕分享的表就么多，哇，那也不少啊。当我们查看一个图片的详细信息时，就要显示以上表里的信息。显示图片所属的分类、给图片打的标签、图片的评论、有文件的话还要显示文件下载信息等。难道让我们6个表去关联查询嘛，当然不能这么多关联来查询数据，我们可以只查询一个表即可，这怎么讲？这里分享表是主表，我们可以在主表里建立一个缓存字段。比如我们叫cache_data字段，赋予它text类型，这样可以存储很长的字符串，而不至于超过字段的最大存储。

这个缓存字段怎么用呢？在新增一条分享信息后，产生分享ID。如果用户发布图片或文件的话，图片信息入图片表，文件信息入文件表，然后把新产生的图片或文件信息写入到缓存字段里。同样的，如果用户有选择分类、打了标签的话，也把相应的信息写入到缓存字段里。对于评论而言，没有必要把全部评论存到缓存字段里，因为你不知道他有多少条记录，可以把最新的10条存到缓存字段里用于显示，这样缓存字段就变成一个二维或三维数组，序列化后存储到分享表里。

array(      &#39;img&#39; = array(    name => &#39;123.jpg&#39;,    url  => &#39;http:
//tech.42xiu.com/123.jpg&#39;,    width  => 800,    width  => 600,   ),
 &#39;file&#39; = array(    name => &#39;abc.zip&#39;,    download_url  => &#39;http:
 //tech.42xiu.com/abc.zip&#39;,    size  => 1.2Mb,   ),
 &#39;category&#39; = array(    1 => array(     id => 5,     name => PHP乐知博客    ),
  2 => array(     id => 6,     name => PHP技术博客    ),   ),
 &#39;tag&#39; => array(    tag1    tag2    ......   ),
 &#39;message&#39; => array(    1 => array(id, uid, name, content, time),    2 => 
 array(id, uid, name, content, time),    3 => array(id, uid, name, content, time),   
  4 => array(id, uid, name, content, time),   ),
)  //比如，上面的数组结构，序列化存入数据库。

Copy after login

UPDATE share SET cache_data=mysql_real_escape_string(serialize($cache_data)) WHERE id=1;这样查询就变得简单了，只需要查询一条就行了，取到缓存字段，把其反序列化，把数组信息提取出来，然后显示到页面。如果是以前那个结构，在几十万的数据量下，估计早崩溃了。数据缓存的方法也许不是最好的，如果你有更好的方法，可以相互学习，相互讨论。