In-depth understanding of MySQL temporary tables-Mysql Tutorial-php.cn

Overview

There are two main types of temporary tables in MySQL, including external temporary tables and internal temporary tables. The external temporary table is a temporary table created through the statement create temporary table.... The temporary table is only valid in this session. After the session is disconnected, the temporary table data will be automatically cleared. There are two main types of internal temporary tables. One is the temporary table in information_schema, and the other is when the session executes a query. If the execution plan contains "Using temporary", a temporary table will be generated. One difference between internal temporary tables and external temporary tables is that we cannot see the table structure definition file frm of the internal temporary table. The table definition file frm of the external temporary table is generally composed of #sql{process id}_{thread id}_serial number, so different sessions can create temporary tables with the same name.

Temporary table

The main difference between temporary tables and ordinary tables is whether the data is automatically cleaned after the instance, session, or statement ends. For example, in the internal temporary table, if we want to store the intermediate result set during a query, the temporary table will be automatically recycled after the query is completed, without affecting the user table structure and data. In addition, temporary tables in different sessions can have the same name. When multiple sessions execute queries, if you want to use temporary tables, there will be no worries about duplicate names. After the temporary table space was introduced in 5.7, all temporary tables are stored in the temporary table space (non-compressed), and the data in the temporary table space can be reused. Temporary tables not only support Innodb engine, but also support myisam engine, memory engine, etc. Therefore, we cannot see the entity (idb file) in the temporary table, but it is not necessarily a memory table and may also be stored in a temporary table space.

Temporary table VS memory table

The temporary table can be either an innodb engine table or a memory engine table. The so-called memory table here refers to the memory engine table. Through the table creation statement create table...engine=memory, all the data is in the memory. The table structure is managed through frm. For the same internal memory engine table, the frm file cannot be seen. , can't even see the information_schema directory on disk. Within MySQL, the temporary tables in information_schema include two types: temporary tables of the innodb engine and temporary tables of the memory engine. For example, the TABLES table belongs to the memory temporary table, while columns and processlist belong to the innodb engine temporary table. All data in the memory table is in the memory. The data structure in the memory is an array (heap table). All data operations are completed in the memory. For small data volume scenarios, the speed is relatively fast (no physical IO operations are involved). But memory is a limited resource after all. Therefore, if the amount of data is relatively large, it is not suitable to use a memory table. Instead, choose to use a disk temporary table (innodb engine). This temporary table uses a B-tree storage structure (innodb engine). Innodb The bufferpool resource is shared, and the data in the temporary table may have a certain impact on the hot data of the bufferpool. In addition, the operation may involve physical IO. Memory engine tables can actually create indexes, including Btree indexes and Hash indexes, so the query speed is very fast. The main drawback is limited memory resources.

Scenarios of using temporary tables

As mentioned earlier, when the execution plan contains "Using temporary", the temporary table will be used. Here are two main scenarios.

The test table structure is as follows:

mysql> show create table t1_normal\G
*************************** 1. row ***************************
       Table: t1_normal
Create Table: CREATE TABLE `t1_normal` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `c1` int(11) DEFAULT NULL,
  `c2` int(11) DEFAULT NULL,
  `c3` int(11) DEFAULT NULL,
  `c4` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=770023 DEFAULT CHARSET=utf8

Copy after login

Scenario 1: union

mysql> explain select * from t1_normal union select * from t1_normal; 
+----+--------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------+ 
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | 
+----+--------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------+ 
| 1 | PRIMARY | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | NULL | | 2 | UNION | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | NULL | | NULL | UNION RESULT | <union1,2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary | +----+--------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------+

Copy after login

The meaning of the union operation is to take the union of the two subquery results, and only keep one row of duplicate data , by creating a temporary table with a primary key, you can solve the "duplication" problem and store the final result set through the temporary table, so you can see "Using temporary" in the Extra item in the execution plan. An operation related to union is union all, which also merges the results of two subqueries, but does not solve the duplication problem. Therefore, for union all, there is no meaning of "duplication removal", so there is no need for a temporary table.

mysql> explain select * from t1_normal  union  all select * from t1_normal;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
|  1 | PRIMARY     | t1_normal | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 523848 |   100.00 | NULL  |
|  2 | UNION       | t1_normal | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 523848 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+

Copy after login

Scenario 2: group by

mysql> explain select c1,count(*) as count from t1_normal group by c1;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                           |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+
|  1 | SIMPLE      | t1_normal | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 523848 |   100.00 | Using temporary; Using filesort |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+

Copy after login

The meaning of group by is to group by the specified column and be ordered by the specified column by default. The meaning of the above SQL statement is to group the data in t1_normal by the value of column c1 and count the number of records for each column value of c1. From the execution plan, we see "Using temporary;Using filesort". For group by, we first need to count the number of occurrences of each value. This requires the use of a temporary table to quickly locate it. If it does not exist, insert a record. , if it exists, and the count is accumulated, so you see "Using temporary"; and because group by implies sorting, you need to sort the records according to column c1, so you see "Using filesort".

1). Eliminate filesort

In fact, group by can also display the elimination of "sort meaning".

mysql> explain select c1,count(*) as count from t1_normal group by c1 order by null;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra           |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
|  1 | SIMPLE      | t1_normal | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 523848 |   100.00 | Using temporary |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-----------------+

Copy after login

You can see that after adding "order by null" to the statement, "Using filesort" no longer appears in the execution plan.

2). Eliminate temporary tables

mysql> explain select SQL_BIG_RESULT c1,count(*) as count from t1_normal group by c1; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+ | 1 | SIMPLE | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | Using filesort | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+

Copy after login

可以看到执行计划中已经没有了“Using temporary”，所以group by并非一定依赖临时表，临时表在group by中的作用主要是“去重”。所以，实际上有另外一种方式，不使用临时表，直接利用sort_buffer排序(sort_buffer不够时，进行文件排序，具体而言是每一个有序数组作为一个单独文件，然后进行外排归并)，然后再扫描得到聚合后的结果集。

3).SQL_BIG_RESULT

同时我们语句中用到了“SQL_BIG_RESULT”这个hint，正是因为这个hint导致了我们没有使用临时表，先说说SQL_BIG_RESULT和SQL_SMALL_RESULT的含义。

SQL_SMALL_RESULT：显示指定用内存表(memory引擎)

SQL_BIG_RESULT：显示指定用磁盘临时表(myisam引擎或innodb引擎)

两者区别在于，使用磁盘临时表可以借助主键做去重排序，适合大数据量；使用内存表写入更快，然后在内存中排序，适合小数据量。下面是从MySQL手册中摘录的说明。

SQL_BIG_RESULT or SQL_SMALL_RESULT can be used with GROUP BY or DISTINCT to tell the optimizer that the result set has many rows or is small, respectively.

For SQL_BIG_RESULT, MySQL directly uses disk-based temporary tables if needed, and prefers sorting to using a temporary table with a key on the GROUP BY elements.

For SQL_SMALL_RESULT, MySQL uses fast temporary tables to store the resulting table instead of using sorting.

This should not normally be needed.

回到问题本身，这里MySQL优化器根据hint知道需要使用磁盘临时表，而最终直接选择了数组存储+文件排序这种更轻量的方式。

如何避免使用临时表

通常的SQL优化方式是让group by 的列建立索引，那么执行group by时，直接按索引扫描该列，并统计即可，也就不需要temporary和filesort了。

mysql> alter table t1_normal add index idx_c1(c1);
Query OK, 0 rows affected (1 min 23.82 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select c1,count(*) as count from t1_normal group by c1 order by null;
+----+-------------+-----------+------------+-------+---------------+--------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key    | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+--------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | t1_normal | NULL       | index | idx_c1        | idx_c1 | 5       | NULL | 523848 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+--------+---------+------+--------+----------+-------------+

Copy after login