This article brings you relevant knowledge about mysql, which mainly introduces related issues about performance optimization, including query optimization and other contents. Let’s take a look at it together. I hope it will be helpful to everyone. helpful.
Recommended learning: mysql video tutorial
Before writing a fast query, you need to be clear that what really matters is the response time , and you need to know how long each step takes during the execution of the entire SQL statement. You need to know which steps are the key steps that drag down the execution efficiency. To do this, you must know the life cycle of the query. Then optimize. Different application scenarios have different optimization methods. Don’t generalize and analyze specific situations.
1. Network
2. CPU
3. IO
4. Context switching
5. System call
6. Generate statistical information
7. Lock waiting time
1. The main reason for low query performance is that too much data is accessed. Some queries inevitably need to filter a large amount of data. We can optimize by reducing the amount of data accessed
(1) Confirm whether the application is running Retrieve a large amount of data that exceeds what is needed
(2) Confirm whether the mysql server layer is analyzing a large number of data rows that exceed what is needed
2. Whether unnecessary data is requested from the database
(1) Query unnecessary records (we often mistakenly think that mysql will only return the required data. In fact, mysql returns all the results first and then performs calculations. In daily development habits, we often use the select statement to query first. A large number of results, and then close the result set after obtaining the first N rows. The optimization method is to add limit after the query)
(2) Return all columns when multiple tables are associated (select * from actor inner join film_actor using( actor_id) inner join film using(film_id) where film.title='Academy Dinosaur';select actor.* from actor...;)
(3) Always take out all columns (in the company's enterprise requirements , it is forbidden to use select *. Although this method can simplify development, it will affect the performance of the query, so try not to use it)
(4) Repeatedly query the same data (if you need to repeatedly execute the same Query, and returns exactly the same data every time. Therefore, based on such application scenarios, we can cache this part of the data, which can improve query efficiency.)
1. Query cache
Before parsing a query statement, if the query cache is turned on, then mysql will first check whether the query hits the data in the query cache. If If the query happens to hit the query cache, the user permissions will be checked before returning the results. If there is no problem with the permissions, then mysql will skip all stages and get the results directly from the cache and return them to the client
2. Query optimization processing
Mysql will go through the following steps after querying the cache: parsing SQL, preprocessing, and optimizing SQL execution plan. If any errors occur in these steps, may terminate the query.
(1) Grammar parser and preprocessing
mysql parses the SQL statement through keywords and generates a parse tree. The mysql parser will use mysql grammar rules to verify and parse the query. , such as verifying whether the wrong keywords are used or the order is correct, etc. The preprocessor will further check whether the parse tree is legal, such as whether the table name and column name exist, whether there is any ambiguity, and also verify permissions, etc.
( 2) Query optimizer
When there is no problem with the syntax tree, the optimizer will convert it into an execution plan. A query statement can use many execution methods, and the corresponding result can be obtained in the end. However, different execution methods bring different efficiencies. The main purpose of the optimizer is to select the most effective execution plan.
MySQL uses a cost-based optimizer. During optimization, it will try to predict the cost of a query using a certain query plan and select the one with the smallest cost.
a. select count(*) from film_actor; show status like 'last_query_cost';
You can see that this query statement requires about 1104 data pages to find the corresponding data. It is calculated through a series of statistical information.
(a) The number of pages in each table or index
(b) The cardinality of the index
(c) Index and data row length
(d) Index distribution
b. In many cases, mysql will choose the wrong execution plan for the following reasons:
(a ) Inaccurate statistical information (InnoDB cannot maintain accurate statistical information on the number of rows in a data table due to its mvcc architecture)
(b) The cost estimate of the execution plan is not equal to the actual execution cost (sometimes although an execution plan needs to read more pages, its cost is smaller, because if these pages are read sequentially Or if these pages are already in memory, then its access cost will be very small. The MySQL level does not know which pages are in memory and which are on disk, so it is impossible to know how many IO times are required during query execution. )
(c) The optimal value of mysql may be different from what you think (the optimization of mysql is based on the cost model optimization, but it may not be the fastest optimization)
(d ) mysql does not consider other concurrently executed queries
(e) mysql does not consider operating costs that are not under its control (the cost of executing stored procedures or user-defined functions)
c. Optimization Optimization strategy of the server
(a) Static optimization (directly analyze the parse tree and complete the optimization)
(b) Dynamic optimization (dynamic optimization is related to the context of the query, and may also be related to the context of the query) The value and the number of rows corresponding to the index are related)
(c) Mysql only needs to statically optimize the query once, but dynamic optimization needs to be re-evaluated every time it is executed
d, Optimizer's optimization type
(a) Redefine the order of related tables (the association of data tables is not always performed in the order specified in the query, and it is a very important function of the optimizer to determine the order of association)
(b) Convert outer joins into inner joins. Inner joins are more efficient than outer joins
(c) Using equivalent transformation rules, mysql can use some equivalent changes to simplify the union Planning expression
(d) Optimize count(), min(), max() (whether the index and column can be null can usually help mysql optimize this type of expression: for example, to find the minimum value of a certain column value, you only need to query the leftmost record of the index, no need for full-text scanning and comparison)
(e) Estimate and convert it into a constant expression, when mysql detects an expression that can be converted into a constant time, the expression will always be treated as a constant. (explain select film.film_id,film_actor.actor_id from film inner join film_actor using(film_id) where film.film_id = 1)
(f) Index coverage scan, when the columns in the index contain all the columns that need to be used in the query Columns, you can use covering index
(g) subquery optimization (mysql can convert the subquery into a more efficient form in some cases, thereby reducing multiple queries to the data multiple times. Access, for example, put frequently queried data into the cache.)
(h) Equivalent propagation (if the values of two columns are related by equality, then mysql can put the where condition of one of the columns Passed to another:
explain select film.film_id from film inner join film_actor using(film_id) where film.film_id > 500;
The film_id field is used here for equivalence association. The film_id column is not only applicable to the film table but also to the film_actor table
explain select film.film_id from film inner join film_actor using(film_id ) where film.film_id > 500 and film_actor.film_id > 500;)
e, Related query
Mysql's related query is very important, but in fact the strategy for executing related queries is relatively simple: mysql performs a nested loop related operation for any relationship, that is, mysql first loops out a single piece of data in a table, and then Nested into the next table to find matching rows, continue in sequence until matching rows are found in all tables. Then based on the matching rows in each table, return the columns required in the query. MySQL will try to find the matching rows in the last associated table After finding all matching rows, if the last related table cannot find more rows, mysql returns to the previous level related table to see if more matching records can be found, and so on iteratively. The overall idea is this, but It should be noted that there are many variants in the actual execution process:
f, sorting optimization
Sorting is a very costly operation anyway, so from a performance perspective, you should try your best to It is possible to avoid sorting or avoid sorting a large amount of data as much as possible.
It is recommended to use indexes for sorting, but when the index cannot be used, MySQL needs to sort itself. If the amount of data is small, it will be done in memory. If the data If the amount is large, you need to use a disk, which is called filesort in mysql.
If the amount of data to be sorted is less than the sort buffer (show variables like '%sort_buffer_size%';), mysql uses memory for quick sorting. If there is not enough memory, Sort, then mysql will first divide the tree into blocks, sort each independent block using quick sort, store the sort results of each block on the disk, then merge the sorted blocks, and finally return to sorting As a result, the following is the sorting algorithm:
(a) Two transmission sorting
The first data reading is to read the fields that need to be sorted, and then sort them. The second time It is to read the data rows according to the sorted results as needed.
This method is relatively inefficient. The reason is that when reading the data for the second time, because it has been sorted, all records need to be read. At this time, more random IO is used, and the cost of reading data will be higher
The advantage of two transmissions is to store as little data as possible during sorting, so that the sorting buffer can accommodate as many rows as possible for sorting operations
(b) Single transmission sorting
先读取查询所需要的所有列,然后再根据给定列进行排序,最后直接返回排序结果,此方式只需要一次顺序IO读取所有的数据,而无须任何的随机IO,问题在于查询的列特别多的时候,会占用大量的存储空间,无法存储大量的数据
(c)如何选择
当需要排序的列的总大小超过max_length_for_sort_data定义的字节,mysql会选择双次排序,反之使用单次排序,当然,用户可以设置此参数的值来选择排序的方式
1、优化count()查询
count()是特殊的函数,有两种不同的作用,一种是某个列值的数量,也可以统计行数。
(1)总有人认为myisam的count函数比较快,这是有前提条件的,只有没有任何where条件的count(*)才是比较快的。
(2)使用近似值
在某些应用场景中,不需要完全精确的值,可以参考使用近似值来代替,比如可以使用explain来获取近似的值
其实在很多OLAP的应用中,需要计算某一个列值的基数,有一个计算近似值的算法叫hyperloglog。
(3)更复杂的优化
一般情况下,count()需要扫描大量的行才能获取精确的数据,其实很难优化,在实际操作的时候可以考虑使用索引覆盖扫描,或者增加汇总表,或者增加外部缓存系统。
2、优化关联查询
(1)确保on或者using子句中的列上有索引,在创建索引的时候就要考虑到关联的顺序
当表A和表B使用列C关联的时候,如果优化器的关联顺序是B、A,那么就不需要再B表的对应列上建上索引,没有用到的索引只会带来额外的负担,一般情况下来说,只需要在关联顺序中的第二个表的相应列上创建索引。
(2)确保任何的groupby和order by中的表达式只涉及到一个表中的列,这样mysql才有可能使用索引来优化这个过程
3、优化子查询
子查询的优化最重要的优化建议是尽可能使用关联查询代替
4、优化limit分页
在很多应用场景中我们需要将数据进行分页,一般会使用limit加上偏移量的方法实现,同时加上合适的orderby 的子句,如果这种方式有索引的帮助,效率通常不错,否则的话需要进行大量的文件排序操作,还有一种情况,当偏移量非常大的时候,前面的大部分数据都会被抛弃,这样的代价太高。
要优化这种查询的话,要么是在页面中限制分页的数量,要么优化大偏移量的性能。
优化此类查询的最简单的办法就是尽可能地使用覆盖索引,而不是查询所有的列。
select film_id,description from film order by title limit 50,5; explain select film.film_id,film.description from film inner join (select film_id from film order by title limit 50,5) as lim using(film_id);
5、优化union查询
mysql总是通过创建并填充临时表的方式来执行union查询,因此很多优化策略在union查询中都没法很好的使用。经常需要手工的将where、limit、order by等子句下推到各个子查询中,以便优化器可以充分利用这些条件进行优化
除非确实需要服务器消除重复的行,否则一定要使用union all,因此没有all关键字,mysql会在查询的时候给临时表加上distinct的关键字,这个操作的代价很高。
6、推荐使用用户自定义变量
用户自定义变量是一个容易被遗忘的mysql特性,但是如果能够用好,在某些场景下可以写出非常高效的查询语句,在查询中混合使用过程化和关系话逻辑的时候,自定义变量会非常有用。
用户自定义变量是一个用来存储内容的临时容器,在连接mysql的整个过程中都存在。
(1)自定义变量的使用
set @one :=1 set @min_actor :=(select min(actor_id) from actor) set @last_week :=current_date-interval 1 week;
(2)自定义变量的限制
a、无法使用查询缓存
b、不能在使用常量或者标识符的地方使用自定义变量,例如表名、列名或者limit子句
c、用户自定义变量的生命周期是在一个连接中有效,所以不能用它们来做连接间的通信
d、不能显式地声明自定义变量地类型
e、mysql优化器在某些场景下可能会将这些变量优化掉,这可能导致代码不按预想地方式运行
f、赋值符号:=的优先级非常低,所以在使用赋值表达式的时候应该明确的使用括号。
g、使用未定义变量不会产生任何语法错误。
(3)自定义变量的使用案例
a、优化排名语句
在给一个变量赋值的同时使用这个变量
select actor_id,@rownum:=@rownum+1 as rownum from actor limit 10;
查询获取演过最多电影的前10名演员,然后根据出演电影次数做一个排名
select actor_id,count(*) as cnt from film_actor group by actor_id order by cnt desc limit 10;
b、避免重新查询刚刚更新的数据
当需要高效的更新一条记录的时间戳,同时希望查询当前记录中存放的时间戳是什么
update t1 set lastUpdated=now() where id =1; select lastUpdated from t1 where id =1; update t1 set lastupdated = now() where id = 1 and @now:=now(); select @now;
c、确定取值的顺序
在赋值和读取变量的时候可能是在查询的不同阶段
(a)set @rownum:=0;
select actor_id,@rownum:=@rownum+1 as cnt from actor where @rownum<=1;
因为where和select在查询的不同阶段执行,所以看到查询到两条记录,这不符合预期
(b)set @rownum:=0;
select actor_id,@rownum:=@rownum+1 as cnt from actor where @rownum<=1 order by first_name
当引入了orde;r by之后,发现打印出了全部结果,这是因为order by引入了文件排序,而where条件是在文件排序操作之前取值的 。
(c)解决这个问题的关键在于让变量的赋值和取值发生在执行查询的同一阶段:
set @rownum:=0; select actor_id,@rownum as cnt from actor where (@rownum:=@rownum+1)<=1;
推荐学习:mysql视频教程
The above is the detailed content of MySQL performance tuning query optimization. For more information, please follow other related articles on the PHP Chinese website!