Text:
1. First, let’s talk about what is an index?
Index is translated as a directory, used to quickly locate the location of the data we want to find. For example: we compare a database to a book, and the index is the table of contents in the book. If we want to find an interesting content in the book, we generally will not read the entire book and then confirm the content. Instead, locate the page number of the content chapter through the book's table of contents, and finally turn to the page directly
Let's take a look at the index in the database:
Full table scan VS index scan
Take a dictionary as an example, Full table scan means that if we search for a certain word, then we read through the Xinhua Dictionary and then find the word we want to find
And the counterpart to full table scan is index search. Index search means to find the word we want in the index part of the table. The specific location of the data we are looking for, and then we will go to the table to find out all the data we want to find
Example: Find a student named Dev in a student table
Scan the full table on the left: Need to scan from The first row starts scanning row by row until the information of the student Dev in row 100008 is found, and this data is returned. However, there may be students with the same name in the table, so the scan does not end. Usually the full table scan needs to find For a piece of data, you need to traverse the entire table of data before you can determine whether to return all the data
Right index scan: Index search is to find the Dev starting with D according to the first letter sorting. If the first letters are the same, then according to the first letter Sort by two letters to find, and so on, we find the ID is 100008, and then go back to the table to find the data with ID 100008
Conclusion: Therefore, the index value of the index (corresponding to InnoDB) corresponds to the primary key ID
2. How to find The value corresponding to the index
InnoDB engine is mainly based on
(1)B+tree
(2) binary search method
B+tree: B+tree has the root node, branch node and page node of the entire tree, the upper layer The management scope of the lower-level nodes will be stored until the specific information of the page node
Binary search method: Compare according to the range of each node stored in the B+ tree, gradually narrow the range, and finally locate the position we want in the page node
3. Let’s introduce that the InnoDB table is also an index table
As shown above, the InnoDB table is a clustered table, which means that InnoDB itself is a large index-organized table, and it is also a B+ tree structure of a large index sorted according to the primary key. We create additional fields in the table we want to index in InnoDB
The clustered index means the InnoDB table itself, and we call these indexes sorted according to other fields called secondary indexes (secondary class)
4. How to create indexes in the database
There are two main types of indexes in MySQL
1. Single column index
create index idx_name on tb_student(name); 索引名 表名 字段名
2. Union index
create index idx_name_age on tb_student(name,age); #索引中先根据name排序,name相同的情况下根据age排序
五、索引维护
首先介绍下什么是索引维护?这是一个关乎性能的重要概念
如果索引所在字段发生了修改、删除、插入等操作,那么索引项就会发生变化,因此如果不能保证索引的有序,那么就不能索引的准确与效率,而索引的排序发生了变化的这个行为,我们称为索引维护 在insert/delete/update操作时,为了维护索引的排序,数据库会自动的完成索引项的维护,索引的排序,这些行为对用户是透明的,感觉不到的 在一个有索引的表中,创建它时,实际上还同时创建了索引排序的表,因此在DML中,插入等操作不再是普通的插入,MySQL将它封装成了一个事务,连着索引项的排序表一起操作 因此,我们应当严格控制表上的索引数量,否则容易影响数据库的性能
总结索引维护如下:
1、索引维护由数据库自动完成
2、插入/修改/删除每一个索引行都变成一个内部封装的事务
3、索引越多,事务越大,代价越高
4、索引越多,对表的插入和索引字段的修改就越慢
因此可以看出索引并非是越多越好,在工作中也要慎用,尤其对于写操作较为频繁的业务
六、如何正确的使用索引?
1、依据where查询条件建立索引
eg: select a,b from tb_test where c = ?; idx_c(c) ->正确 select a,b from tb_test where c = ? and b = ? idx_cd(c,d) ->正确
2、根据排序order by ,group by , distinct 字段添加索引
eg: select * from tb_test order by a; select a,count(*) from tb_test group by a; idx_a(a) ->正确 select * from tb_test order by a,b; idx_a_b(a,b) ->正确 select * from tb_test order where c = ? by a; idx_c_a(c,a) ->正确
七、到底哪些字段适合创建索引?
1、字段值的重复程度,如图:
身份证号码由于基本上不可能重复,因此选择性非常好,而人的名字重复性较低,选择性也不错, 性别选择性较差,重复度非常高
2、选择性很差的字段通常不适合创建索引,但也有例外
如:男女比例相仿的表中,性别不适合创建单列索引,如果走索引不如走全表扫描, 因为走索引的I/O开销更大 但如果男女比例极度不平衡,要查询的又是少数方,如:理工学校、IT公司等可以考虑使用索引
3、联合索引中选择性好的字段应该排在前面
select * from tab_a where gender=? and name=? idx_name_gender(name,gender) ->正确
4、联合索引可以为单列、复列查询提供帮助
idx_smp(a,b,c) where a=?; ->正确 where a=? and b=?; ->正确 where a=? and c=?; ->正确 (注:需要MySQL5.6版本以上;在5.5及以前版本,可以对a字段进行索引扫描,但c字段不行 ) where a=? and b=? and c=? ->正确
5、合理创建联合索引,避免冗余
(a),(a,b),(a,b,c) ->不可取 (a,b,c) ->正确,可以覆盖前两个
八、再来看看如何在长字段上建立索引呢
首先,在较长的字段上建立索引是非常影响性能的,比如文章等超大varchar或者text字段,如果不是非建不可,一般不推荐,另外对InnoDB索引单字段(utf8)只能取前767bytes
那么如何处理长字段索引?
主要根据类型来分别处理: 1、Email类,可以建立前缀索引 mail_addr varchar(2048) idx_mailadd(mail_addr(39)) -> 正确 解析:由于email邮件类型字段,一般后缀都有较大可能相同,如.com .cn等等,而前缀相同的可能性较低,且邮箱一般长度较短,因此可以建立前缀索引 2、住址类,分拆字段 home_addr varchar(2048) idx_homeadd(home_addr(30)) ->错误,很可能前半段是相同的省市区街道名 province_add varchar(1024),city_add varchar(1024), district_add `varchar(1024),lolcal_add varchar(1024) --建立联合索引或者单列索引 ->正确`
九、对核心SQL索引做覆盖扫描
对于最核心的SQL,我们可以考虑使用索引覆盖,什么是索引覆盖呢,下面是个例子
select name from tb_user where userid=? key idx_uid_name(userid,name) ->覆盖索引扫描
我们查询用户名这种操作频率非常高,而索引里面又存储了字段的值,因此在我们做查询时,name字段的值直接在索引中返回,而不需要回表;还有一个使用非常广泛的例子:用户登陆,我们可以将username password做覆盖索引,这样大大提高登陆验证的速度
因此覆盖索引覆盖就是将你要查询的字段和条件字段一起建立联合索引,这样的好处是不需要回表获取name字段,IO最小,速度块
十、哪些情况无法使用索引?
1、索引列进行数据运算或者函数运算
eg: where id+1=10; ->错误,无法利用到索引 where id=(10-1) ->正确 where year(id) < 2016 ->错误,无法利用到索引 where col < '2016-01-01' ->正确
2、未含复合索引的前缀字段
idx_abc(a,b,c) where b=? and c=? ->错误,无法利用到索引 正确的建立索引方式(b,c)
3、前缀通配符"_" "%"等
like '%ttt%' ->错误,无法利用到索引 like "ttt%" ->正确
4、where条件使用NOT,<>,!= 通常也无法使用到索引
5、字段类型不匹配
字段类型并不绝对匹配时,可能会导致无法使用索引 a int(11) ,idx_a(a) where a = '123' ->错误,可能导致未知的错误,这个跟编码有关系 where a = 123 ->正确
十一、利用索引做排序操作
以 idx_ab(a,b)索引为例
1、能使用上述索引进行排序的操作是:
order by a; a = 3 order by b; order by a,b; order by a desc ,b desc; a > 5 order by a;
2、不能使用索引帮助排序的查询
order by b; #没有使用到联合索引的第一个字段 a > 5 order by b; #一旦前缀操作是一个range而非=操作,那么就无法利用到索引, 这里 a>5无法利用索引,二联合索引的第一个字段未利用, 因此 order by b也无法利用索引查询 a in (1,3) order by b; #in里面的值没有建立索引,因此无法利用索引,a未用因此order by b也无法使用 order by a asc, b desc; #这里order by a esc是利用了索引,但是b desc未利用到,因为b要和a排序方式一致才可利用到索引
十二、如何确定一个查询有没有走索引,走了哪些索引?
MySQL中自带命令行工具 explain 来查看一个sql语句是否了索引
使用方式:
explain select * from tb_test;
关注的项:
1、type : 查询access的方式,表的连接类型 index | 索引 full | 全表扫描 ref | 参照查询,也就是等值查询 range | 范围查询 2、key : 本次查询最终选择使用哪个索引,NULL为未使用索引 3、key_len : 选择的索引使用的前缀长度或者整个长度 4、rows : 查询逻辑扫描过的记录行数 5、extra : 额外信息,主要是指fetch data的具体方式
总结:索引的本质还是提升我们查询数据库的速度,减少服务器I/O开销,提供更稳定快捷的服务