MySQL”海量数据”查询性能分析_MySQL-MySQL-Tutorial-php.cn

MySQL”海量数据”查询性能分析_MySQL

WBOY

Freigeben： 2016-06-01 13:50:27

Original

1107 Leute haben es durchsucht

bitsCN.com

最近做了一次MySQL所谓的”海量数据”查询性能分析.

表结构

dt dt2 dt3 it it2 it3 id id id id id id int PK ext1 ext1 varchar(256) time time time time time time int/datetime KEY ext2 ext2 ext2 ext2 varchar(128)

说明, MyISAM引擎, dt表示时间字段使用datetime类型, it表示时间字段使用int类型.

初始数据

首先生成100K个UNIX时间戳(int), 然后随机选取10M次, 每一次往6个表里插入一条记录(当time字段是datetime类型时, 做类型转换). 所以每一个表都有10M条记录. ext1和ext2字段会用随机的字符串填充.

SQL查询

使用的查询SQL语句如:

select SQL_NO_CACHE count(*) from it where time>10000;select SQL_NO_CACHE count(*) from dt where time>from_unixtime(10000);select SQL_NO_CACHE * from it where time>10000 order by time limit 1;select SQL_NO_CACHE * from it use key(PRIMARY) where time>10000 order by id limit 1;

Nach dem Login kopieren

SQL_NO_CACHE用于消除查询结果缓存的影响. use key用于指定查询时使用的索引. 统计每一条SQL的执行时间(单位s)和满足WHERE条件的记录总数(total), it-tm表示在dt表上执行SQL的耗时, 并explain得到key和extra, 结果如下.

where total select orderby key it-tm dt-tm it2-tm dt2-tm it3-tm dt3-tm extra time>10000 8999050 count(*) time 3.52 4.28 3.74 4.49 3.53 4.47 where; index count(time) time 3.44 4.00 3.69 4.36 3.56 4.26 where; index count(id) NULL 1.44 1.92 4.30 4.60 4.79 4.98 where * time time 0.00 0.00 0.00 0.00 0.00 0.00 where * id time 14.81 15.38 19.37 20.30 20.94 21.42 where; filesort * id PK 0.00 0.03 0.00 0.02 0.00 0.04 where time>50000 4987990 count(*) 1.90 2.36 2.02 2.41 1.99 2.42 count(time) 1.90 2.23 2.01 2.32 1.96 2.29 count(id) 1.48 1.91 4.25 4.61 4.80 5.12 * time 0.00 0.00 0.00 0.00 0.00 0.00 * id 8.15 8.77 10.74 11.36 11.59 11.79 * id 0.00 0.00 0.00 0.00 0.00 0.00 time>80000 1991982 count(*) 0.76 0.95 0.83 0.98 0.80 1.00 count(time) 0.77 0.91 0.81 0.91 0.83 0.92 count(id) 1.38 1.86 4.17 4.49 4.71 5.02 * time 0.00 0.00 0.00 0.00 0.00 0.00 * id 3.26 3.44 4.26 4.51 4.56 4.76 * id 0.00 0.00 0.00 0.00 0.00 0.00 time>99900 10871 count(*) 0.00 0.00 0.00 0.01 0.01 0.00 count(time) 0.01 0.01 0.01 0.00 0.01 0.01 count(id) 0.01 0.01 0.02 0.03 0.02 0.02 * time 0.00 0.00 0.00 0.00 0.00 0.00 * id 0.02 0.02 0.03 0.03 0.03 0.03 * id 0.00 0.00 0.00 0.00 0.00 0.00

分析和结论

count(*), count(time)和count(id)的对比. 结果有较大变化. 当表的字段只有2个且查询条件较宽松(即符合条件的记录数较多)时, count(id)比count(*)快很多, 但是, 当表中还有其它的字段时, count(id)反而更慢了. 虽然id是主键, time是索引列, 但是select count(id) where time并没有用到索引, 而是进行全表扫描. 当对count(*)进行ignore key(time)时, 查询时间和count(id)相同.证明当结果集较大时索引导致查询变慢,应该是全表扫描进行的是连续的磁盘IO和内存操作, 而使用索引是进行随机的磁盘IO和内存操作, 并且MyISAM存储索引的BTree结构占用更多的空间. 当WHERE条件约束更严格, total的值小到一定程度时, 全表扫描比使用索引慢, 因为索引极大减少了磁盘IO和内存操作.

排序字段和索引的使用. 当有排序且LIMIT(偏移为0)时,如果查询时使用的索引不是排序字段的索引, 那么速度非常慢. 当偏移不为0时, 如果使用排序列的索引, 要考虑偏移可能导致扫描的记录数, 所以应该根据情况选取合适的索引.

判断符合条件的记录是否存在, 使用select * limit 1速度要比select count(*)计数快得多.

时间字段类型的选择. int比datetime快, 但差距不是很明显.

无论如何, 条件限制得越严格, 查询就会越快.

另外, 根据随机id更新时, 大约能达到5K行/s.

列的先后顺序对查询性能的影响也非常大.

bitsCN.com