Interviewer: How to quickly query tens of millions of data?
Let’s look at an interview scenario first:
Interviewer: Let’s talk about 10 million data. How did you inquire? Little brother: Query directly by paging, use limit paging. Interviewer: Have you ever done it in practice? Little brother: There must be one
Maybe some friends have never encountered a table with tens of millions of data, and they don’t know how to query it. What happens when there is tens of millions of data?
Today I will show you how to practice. This time it is based on the MySQL 5.7.26 version for testing
Preparing data
What should I do if I don’t have 10 million data?
Create it
Code to create 10 million? That's impossible, it's too slow, and it might take a whole day. You can use database scripts to execute much faster.
Create table
CREATE TABLE `user_operation_log` ( `id` int(11) NOT NULL AUTO_INCREMENT, `user_id` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `ip` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `op_data` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr3` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr4` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr5` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr6` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr7` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr8` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr9` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr10` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr11` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, `attr12` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL, PRIMARY KEY (`id`) USING BTREE ) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
Create data script
Using batch insertion, the efficiency will be much faster, and every 1000 items will be committed. If the amount of data is too large, it will also lead to batch insertion. Insertion efficiency is slow
DELIMITER ;; CREATE PROCEDURE batch_insert_log() BEGIN DECLARE i INT DEFAULT 1; DECLARE userId INT DEFAULT 10000000; set @execSql = 'INSERT INTO `test`.`user_operation_log`(`user_id`, `ip`, `op_data`, `attr1`, `attr2`, `attr3`, `attr4`, `attr5`, `attr6`, `attr7`, `attr8`, `attr9`, `attr10`, `attr11`, `attr12`) VALUES'; set @execData = ''; WHILE i<=10000000 DO set @attr = "'测试很长很长很长很长很长很长很长很长很长很长很长很长很长很长很长很长很长的属性'"; set @execData = concat(@execData, "(", userId + i, ", '10.0.69.175', '用户登录操作'", ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ")"); if i % 1000 = 0 then set @stmtSql = concat(@execSql, @execData,";"); prepare stmt from @stmtSql; execute stmt; DEALLOCATE prepare stmt; commit; set @execData = ""; else set @execData = concat(@execData, ","); end if; SET i=i+1; END WHILE; END;; DELIMITER ;
开始测试
田哥的电脑配置比较低:win10 标压渣渣i5 读写约500MB的SSD
由于配置低,本次测试只准备了3148000条数据,占用了磁盘5G(还没建索引的情况下),跑了38min,电脑配置好的同学,可以插入多点数据测试
SELECT count(1) FROM `user_operation_log`
返回结果:3148000
三次查询时间分别为:
14060 ms 13755 ms 13447 ms
普通分页查询
MySQL 支持 LIMIT 语句来选取指定的条数数据, Oracle 可以使用 ROWNUM 来选取。
MySQL分页查询语法如下:
SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset
第一个参数指定第一个返回记录行的偏移量 第二个参数指定返回记录行的最大数目
下面我们开始测试查询结果:
SELECT * FROM `user_operation_log` LIMIT 10000, 10
查询3次时间分别为:
59 ms 49 ms 50 ms
这样看起来速度还行,不过是本地数据库,速度自然快点。
换个角度来测试
相同偏移量,不同数据量
SELECT * FROM `user_operation_log` LIMIT 10000, 10 SELECT * FROM `user_operation_log` LIMIT 10000, 100 SELECT * FROM `user_operation_log` LIMIT 10000, 1000 SELECT * FROM `user_operation_log` LIMIT 10000, 10000 SELECT * FROM `user_operation_log` LIMIT 10000, 100000 SELECT * FROM `user_operation_log` LIMIT 10000, 1000000
查询时间如下:
Quantity | First time | Second time | Third time |
---|---|---|---|
10 items | 53ms | 52ms | 47ms |
100 items | 50ms | 60ms | 55ms |
61ms | 74ms | 60ms | |
10000 items | 164ms | 180ms | 217ms |
100000 items | 1609ms | 1741ms | 1764ms |
1000000 items | 16219ms | 16889ms | 17081ms |
From the above results we can conclude: The larger the amount of data, the longer it takes
相同数据量,不同偏移量
SELECT * FROM `user_operation_log` LIMIT 100, 100 SELECT * FROM `user_operation_log` LIMIT 1000, 100 SELECT * FROM `user_operation_log` LIMIT 10000, 100 SELECT * FROM `user_operation_log` LIMIT 100000, 100 SELECT * FROM `user_operation_log` LIMIT 1000000, 100
偏移量 | 第一次 | 第二次 | 第三次 |
---|---|---|---|
100 | 36ms | 40ms | 36ms |
1000 | 31ms | 38ms | 32ms |
10000 | 53ms | 48ms | 51ms |
100000 | 622ms | 576ms | 627ms |
1000000 | 4891ms | 5076ms | 4856ms |
从上面结果可以得出结束:偏移量越大,花费时间越长
SELECT * FROM `user_operation_log` LIMIT 100, 100 SELECT id, attr FROM `user_operation_log` LIMIT 100, 100
如何优化
既然我们经过上面一番的折腾,也得出了结论,针对上面两个问题:偏移大、数据量大,我们分别着手优化
优化偏移量大问题
采用子查询方式
我们可以先定位偏移位置的 id,然后再查询数据
SELECT * FROM `user_operation_log` LIMIT 1000000, 10 SELECT id FROM `user_operation_log` LIMIT 1000000, 1 SELECT * FROM `user_operation_log` WHERE id >= (SELECT id FROM `user_operation_log` LIMIT 1000000, 1) LIMIT 10
查询结果如下:
sql | Spend time |
---|---|
First article | 4818ms |
The second article (without index) | 4329ms |
The second article (with index) ) | 199ms |
The third article (without index) | 4319ms |
The third Bar (with index) | 201ms |
从上面结果得出结论:
第一条花费的时间最大,第三条比第一条稍微好点 子查询使用索引速度更快
缺点:只适用于id递增的情况
id非递增的情况可以使用以下写法,但这种缺点是分页查询只能放在子查询里面
注意:某些 mysql 版本不支持在 in 子句中使用 limit,所以采用了多个嵌套select
SELECT * FROM `user_operation_log` WHERE id IN (SELECT t.id FROM (SELECT id FROM `user_operation_log` LIMIT 1000000, 10) AS t)
采用 id 限定方式
这种方法要求更高些,id必须是连续递增,而且还得计算id的范围,然后使用 between,sql如下
SELECT * FROM `user_operation_log` WHERE id between 1000000 AND 1000100 LIMIT 100 SELECT * FROM `user_operation_log` WHERE id >= 1000000 LIMIT 100
查询结果如下:
sql | 花费时间 |
---|---|
第一条 | 22ms |
第二条 | 21ms |
从结果可以看出这种方式非常快
注意:这里的 LIMIT 是限制了条数,没有采用偏移量
优化数据量大问题
返回结果的数据量也会直接影响速度
SELECT * FROM `user_operation_log` LIMIT 1, 1000000 SELECT id FROM `user_operation_log` LIMIT 1, 1000000 SELECT id, user_id, ip, op_data, attr1, attr2, attr3, attr4, attr5, attr6, attr7, attr8, attr9, attr10, attr11, attr12 FROM `user_operation_log` LIMIT 1, 1000000
查询结果如下:
sql | 花费时间 |
---|---|
第一条 | 15676ms |
第二条 | 7298ms |
第三条 | 15960ms |
It can be seen from the results that by reducing unnecessary columns, the query efficiency can also be significantly improved
The first and third query speeds are almost the same. At this time, you will definitely complain, then I Why write so many fields? Just * and you’re done.
Note that my MySQL server and client are on the same machine, so the query data is similar. Qualified students can test it. Test the client separately from MySQL
SELECT * Isn’t it nice?
By the way, I would like to add why SELECT * is prohibited. Isn't it delicious because it's simple and mindless?
Two main points:
Using "SELECT *" the database needs to parse more objects, fields, permissions, attributes and other related content. When the SQL statement is complex and hard When there are many parses, it will cause a heavy burden on the database. Increase network overhead, * Sometimes useless and large text fields such as log and IconMD5 are mistakenly added, and the data transmission size will increase geometrically. Especially since MySQL and the application are not on the same machine, this overhead is very obvious.
End
Finally, I hope you all can practice it yourself, and you will definitely gain more!
The above is the detailed content of Interviewer: How to quickly query tens of millions of data?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

Apache connects to a database requires the following steps: Install the database driver. Configure the web.xml file to create a connection pool. Create a JDBC data source and specify the connection settings. Use the JDBC API to access the database from Java code, including getting connections, creating statements, binding parameters, executing queries or updates, and processing results.

The process of starting MySQL in Docker consists of the following steps: Pull the MySQL image to create and start the container, set the root user password, and map the port verification connection Create the database and the user grants all permissions to the database

Installing MySQL on CentOS involves the following steps: Adding the appropriate MySQL yum source. Execute the yum install mysql-server command to install the MySQL server. Use the mysql_secure_installation command to make security settings, such as setting the root user password. Customize the MySQL configuration file as needed. Tune MySQL parameters and optimize databases for performance.

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.
