Why is count(*) so slow? Cause Analysis-Mysql Tutorial-php.cn

Table of Contents

必要知识点

原因分析

Execution process

Summary

Home

Database

Mysql Tutorial

Why is count(*) so slow? Cause Analysis

青灯夜游

Jan 05, 2023 pm 09:21 PM

mysql rear end

Why is count(*) so slow? The following article will analyze the reasons for you and talk about the execution process of count(*). I hope it will be helpful to everyone!

Why is count(*) so slow? Cause Analysis

I didn’t want to write this article, because I think most experienced developers have encountered this thing and must have understood the related reasons, but recently I saw that several technical public accounts that follow are pushing related articles. It really surprised me!

First go to the conclusion of the public account article:

count(*): It will obtain the data of all rows without any processing, and the number of rows will be added 1.
count(1): It will get the data of all rows, with a fixed value of 1 for each row, which is also the number of rows plus 1.
count(id): id represents the primary key. It needs to parse the id field from all rows of data. The id must not be NULL, and the number of rows is increased by 1.
count (ordinary index column): It needs to parse the ordinary index column from the data of all rows, and then determine whether it is NULL. If not, the row number is 1.
count (unindexed column): It scans the entire table to obtain all data, does not add indexed columns in the analysis, and then determines whether it is NULL. If it is not NULL, the number of rows is 1.

Conclusion: count(*) ≈ count(1) > count(id) > count (ordinary index column) > count (unindexed column)

I I don’t want to give it away, the above conclusion is pure fart. It is simply something that comes out of personal fantasies, and I am not even willing to verify it. Even if I take a look at the execution plan, I cannot come to such an outrageous conclusion.

I can’t believe this is an article that has been reposted by multiple technical public accounts!

All the following content is based on the analysis of

mysql 5.7 InnoDB engine.

Extension:

MyISAM If there are no query conditions, but simply count the total number of data in the table, the return will be super fast, because the total number of rows in the table information obtained by the service layer is accurate , while InnoDB is just an estimate.

Example

Without further ado, let’s look at an example first.

The following is a table with a data volume of 1 million. The fields in the table are relatively short, and the overall data volume is not large.

CREATE TABLE `hospital_statistics_data` (
  `pk_id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT &#39;主键&#39;,
  `id` varchar(36) COLLATE utf8mb4_general_ci NOT NULL COMMENT &#39;外键&#39;,
  `hospital_code` varchar(36) COLLATE utf8mb4_general_ci NOT NULL COMMENT &#39;医院编码&#39;,
  `biz_type` tinyint NOT NULL COMMENT &#39;1服务流程  2管理效果&#39;,
  `item_code` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT &#39;考核项目编码&#39;,
  `item_name` varchar(64) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT &#39;考核项目名称&#39;,
  `item_value` varchar(36) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT &#39;考核结果&#39;,
  `is_deleted` tinyint DEFAULT NULL COMMENT &#39;是否删除 0否 1是&#39;,
  `gmt_created` datetime DEFAULT NULL COMMENT &#39;创建时间&#39;,
  `gmt_modified` datetime DEFAULT NULL COMMENT &#39;gmt_modified&#39;,
  `gmt_deleted` datetime(3) DEFAULT &#39;9999-12-31 23:59:59.000&#39; COMMENT &#39;删除时间&#39;,
  PRIMARY KEY (`pk_id`)
) DEFAULT CHARSET=utf8mb4  COMMENT=&#39;医院统计数据&#39;;

Copy after login

The initial state of this table has only one

clustered index.

The following is a breakdown of different index situations. Let’s take a look at the execution plan of COUNT(*).

1) Take a look at the execution plan when there is only one clustered index.

EXPLAIN select COUNT(*) from hospital_statistics_data;

Copy after login

Result:

#The meaning of each parameter of the execution plan is not within the scope of this article, you can understand it yourself.

Only focus on the following attributes here.

type: Index is displayed here, indicating that the index is used.
key: PRIMARY uses the primary key index.
key_len: Index length 8 bytes.

There is a very critical point here:

count(*) will also go to the index , and in the current case a clustered index is used.

Okay, let’s look down.

2) There is a non-clustered index (secondary index)

Add a hospital_code index to the table.

alter table hospital_statistics_data add index idx_hospital_code(hospital_code)

Copy after login

There are two indexes in the table at this time,

primary key and hospital_code.

Similarly, execute it again:

EXPLAIN select COUNT(*) from hospital_statistics_data;

Copy after login

Result:

##Similarly, look at the three fields of type, key and key_len .

Do you feel a little "

magical

"? Why does the index change to the newly added

idx_hospital_code

. Don’t rush to the conclusion, and then look at the following situation.

3) There are two non-clustered indexes (secondary indexes)

On the basis of the above, add another secondary index.

alter table hospital_statistics_data add index idx_biz_type(biz_type)

Copy after login

There are three indexes in the table at this time, primary key, hospital_code and biz_type.

Similarly, execute:

EXPLAIN select COUNT(*) from hospital_statistics_data;

Copy after login

Result:

Are you even more confused? The index is... and... .Changed.

becomes the newly added idx_biz_type.

Let’s not talk about why the above changes occurred, let’s continue the analysis below.

Based on the above three indexes, let’s take a look at them respectively,

count(1)

, count(id), count(index), count (no index)What are the differences between these four situations and the execution plan of count(*).

##count(id) For the sample table, the primary key is pk_id

Why is count(*) so slow? Cause Analysis

count(index)

这里选取biz_type索引字段。

count(无索引)

小结：

count(index) 会使用当前index指定的索引。
count(无索引) 是全表扫描，未走索引。
count(1) , count(*), count(id) 一样都会选择idx_biz_type索引

看到这，你还觉得那些千篇一律的公众号文章的结论正确吗？

必要知识点

mysql 分为service层和引擎层。
所有的sql在执行前会经过service层的优化，优化分为很多类型，简单的来说可分为成本和规则。
执行计划所反映的是service层经过sql优化后，可能的执行过程。并非绝对（免得有些人说我只看执行计划过于片面）。绝大多数情况执行计划是可信的。
索引类型分为聚簇索引和非聚簇索引（二级索引）。其中数据都是挂在聚簇索引上的，非聚簇索引上只是记录的主键id。
抛开数据内存，只谈数据量，都是扯淡。什么500w就是极限，什么2个表以上的join都需要优化了，什么is null不会走索引等，纯纯的放屁。
相信一点，编写mysql代码的人比，看此文章的大部分人都要优秀。他们会尽可能在执行前，对我这样菜逼写的乱七八糟的sql进行优化。

原因分析

其实原因非常非常简单，上面也说了，service层会基于成本进行优化。

并且，正常情况下，非聚簇索引所占有的内存要远远小于聚簇索引。所以问题来了，如果你是mysql的开发人员，你在执行count(*)查询的时候会使用那个索引？

我相信正常人都会使用非聚簇索引。

那如果存在2个甚至多个非聚簇索引又该如何选择呢？

那肯定选择最短的，占用内存最小的一个呀，在回头看看上面的实例，还迷惑吗。

同样都是非聚簇索引。idx_hospital_code的len是146字节；而idx_biz_type的len只有1。那还要选吗？

那为何count(*)走了索引，却还是很慢呢？

这里要明确一点，索引只是提升效率的一种方式，但不能完全的解决效率问题。count(*)有一个明显的缺陷，就是它要计算总数，那就意味着要遍历所有符合条件的数据，相当于一个计数器，在数据量足够大的情况下，即使使用非聚簇索引也无法优化太多。

官方文档：

InnoDBhandlesSELECT COUNT(*)andSELECT COUNT(1)operations in the same way. There is no performance difference.

简单的来说就是，InnoDB下 count(*) 等价于 count(1)

既然会自动走索引，那么上面那个所谓的速度排序还觉得对吗？ count(*)的性能跟数据量有很大的关系，此外最好有一个字段长度较短的二级索引。

拓展：

另外，多说一下，关于网上说的那些索引失效的情况，大多都是片面的，我这里只说一点。量变才能引起质变，索引的失效取决于你圈定数据的范围，若你圈定的数据量占整体数据量的比例过高，则会放弃使用索引，反之则会优先使用索引。但是此规则并不是完美的，有时候可能与你预期的不同，也可以通过一些技巧强制使用索引，但这种方式少用。

举个栗子：

通过上面这个表hospital_statistics_data，我进行了如下查询：

select * from hospital_statistics_data where hospital_code is not null;

Copy after login

此时这个sql会使用到hospital_code的索引吗？

这里也不卖关子了，若hospital_code只有很少一部分数据是null值，那么将不会走索引，反之则走索引。

原因就2个字：回表。

It’s like buying sugar oranges. If you only buy a few kilograms, then you can just pick the best ones in the basket. But if you want to buy a basket, I believe the boss will not let you pick one by one, but will give you a whole basket at once. Of course, everyone is not stupid, and they all know that there must be a few bad fruits in the basket. But this is the most efficient and causes less loss to the boss.

Execution process

Excerpted from "Understanding MySQL from the Root". I strongly recommend that those who have not systematically learned MySQL read this book.

1. First maintain a count variable in the server layer

2. The server layer asks the InnoDB engine for the first record

3. InnoDB finds the first secondary index record and return it to the server layer (note: since it is only counting the number of records at this time, there is no need to return the table)

4. Since the parameter of the COUNT function is *, MySQL will treat * as a constant 0 . Since 0 is not NULL, the server layer adds 1 to the count variable.

5. The server layer asks InnoDB for the next record.

6.InnoDB finds the next secondary index record through the next_record attribute of the secondary index record and returns it to the server layer.

7. The server layer continues to add 1 to the count variable.

8. Repeat the above process until InnoDB returns no recordable message to the server layer.

9. The server layer sends the final value of the count variable to the client.

Summary

After I finished writing, I still felt quite depressed. There are fewer and fewer good articles that can be obtained from the public account, and now it is paid for knowledge. It's time.

I really miss the days when I first started working. At that time, I spent some time reading official account articles every morning, but now they are all advertisements. Why!

But it’s normal, no one can always generate electricity for love.

It is recommended to read more books when studying. Generally, those that can be written into books are not too bad. Now all I can search at night are the same articles, I don’t know whether they are right or wrong. Online

[Related recommendations: mysql video tutorial]

The above is the detailed content of Why is count(*) so slow? Cause Analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7467

CakePHP Tutorial

1376

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

The relationship between mysql user and database Apr 08, 2025 pm 07:15 PM

In MySQL database, the relationship between the user and the database is defined by permissions and tables. The user has a username and password to access the database. Permissions are granted through the GRANT command, while the table is created by the CREATE TABLE command. To establish a relationship between a user and a database, you need to create a database, create a user, and then grant permissions.

MySQL: The Ease of Data Management for Beginners Apr 09, 2025 am 12:07 AM

MySQL is suitable for beginners because it is simple to install, powerful and easy to manage data. 1. Simple installation and configuration, suitable for a variety of operating systems. 2. Support basic operations such as creating databases and tables, inserting, querying, updating and deleting data. 3. Provide advanced functions such as JOIN operations and subqueries. 4. Performance can be improved through indexing, query optimization and table partitioning. 5. Support backup, recovery and security measures to ensure data security and consistency.

How to fill in mysql username and password Apr 08, 2025 pm 07:09 PM

To fill in the MySQL username and password: 1. Determine the username and password; 2. Connect to the database; 3. Use the username and password to execute queries and commands.

Query optimization in MySQL is essential for improving database performance, especially when dealing with large data sets Apr 08, 2025 pm 07:12 PM

1. Use the correct index to speed up data retrieval by reducing the amount of data scanned select*frommployeeswherelast_name='smith'; if you look up a column of a table multiple times, create an index for that column. If you or your app needs data from multiple columns according to the criteria, create a composite index 2. Avoid select * only those required columns, if you select all unwanted columns, this will only consume more server memory and cause the server to slow down at high load or frequency times For example, your table contains columns such as created_at and updated_at and timestamps, and then avoid selecting * because they do not require inefficient query se

Can I retrieve the database password in Navicat? Apr 08, 2025 pm 09:51 PM

Navicat itself does not store the database password, and can only retrieve the encrypted password. Solution: 1. Check the password manager; 2. Check Navicat's "Remember Password" function; 3. Reset the database password; 4. Contact the database administrator.

How to view mysql Apr 08, 2025 pm 07:21 PM

View the MySQL database with the following command: Connect to the server: mysql -u Username -p Password Run SHOW DATABASES; Command to get all existing databases Select database: USE database name; View table: SHOW TABLES; View table structure: DESCRIBE table name; View data: SELECT * FROM table name;

How to create navicat premium Apr 09, 2025 am 07:09 AM

Create a database using Navicat Premium: Connect to the database server and enter the connection parameters. Right-click on the server and select Create Database. Enter the name of the new database and the specified character set and collation. Connect to the new database and create the table in the Object Browser. Right-click on the table and select Insert Data to insert the data.

How to copy tables in mysql Apr 08, 2025 pm 07:24 PM

Copying a table in MySQL requires creating new tables, inserting data, setting foreign keys, copying indexes, triggers, stored procedures, and functions. The specific steps include: creating a new table with the same structure. Insert data from the original table into a new table. Set the same foreign key constraint (if the original table has one). Create the same index. Create the same trigger (if the original table has one). Create the same stored procedure or function (if the original table is used).

See all articles