You made a wise choice
Understanding indexes is extremely important for both development and DBA
Poor indexes bear a considerable part of the responsibility for product problems
Indexes are not that advanced a problem
MySQL index overview
Understand indexes
Create the best index for your application
Embrace the limitations of MySQL
Introduction to indexes
What are indexes used for?
Speed up reading data from the database
Enforce constraints (UNIQUE index, foreign key FOREIGN KEY)
The query page can run normally without any index
But that may take a long time to execute
Index types you may have heard of
BTREE index – the main index type in mysql
RTREE index – only supported by MyISAM, for GIS
HASH index – MEMORY, NDB supported
BITMAP index – MySQL does not support
FULLTEXT index – MyISAM, Innodb (supported by MySQL 5.6 or above)
BTREE-like index family
Yes Lots of different implementations
Sharing the same properties in accelerable operations
Memory makes life better than hard disk
B+Tree is usually used for hard disk storage
Data is stored in leaf nodes
B+Tree example
MyISAM, Innodb index comparison
MyISAM
The data pointer points to the physical location in the data file
All indexes are the same (pointing to the physical location))
Innodb
Primary key index (explicit or implicit) ) - Store data directly in the leaf nodes of the index instead of pointers
Secondary index – Save the value of the primary key index as a data pointer
What operations can the BTREE index be used for?
Query all records with KEY=5 (point Query)
Query all records with KEY>5 (open and closed)
Query all records with 5 Not applicable to: Query all records with the last number of KEY equal to 0 because This cannot be defined as a range query operation Character index This is no different (and numeric)... Really collation is a collation defined for strings such as: "AAAA" Prefix LIKE The query is a special range query LIKE "ABC%" means: "ABC[minimum value]" LIKE "%ABC" cannot use index query joint index is sorted like this, compare the first column, then the second column, the third column and so on, such as: KEY(col1,col2,col3) (1,2,3) < ; (1,3,1) Use a BTREE index instead of a separate BTREE index for each level Overhead of indexes Indices are expensive, don’t add redundant indexes In most cases, extend the index Better than adding a new index Write - updating the index is often the main cost of database write operations Read - requires hard disk and memory overhead space; additional overhead is required in query optimization The impact of index cost Long primary key Index with insertion in random order – SHA1('password') Low-discrimination indexes are inferior – Indexes built on the gender field Related indexes are less expensive – insert_time is related to auto-incremented ids Innodb table Index Data is clustered by primary key Choose the best field as the primary key For example, the comment table – (POST_ID, COMMENT_ID) is a good choice as the primary key, so that the comments of a single post are clustered together or "packed" with a single BIGINT (Field) The primary key is implicitly attached to all indexes KEY (A) is essentially KEY (A,ID) covering the index, which is good for sorting How MySQL uses indexes Queries Sort Avoid reading data (only read index) Other specialized optimizations Use index for query SELECT * FROM EMPLOYEES WHERELAST_NAME = "Smith" This is a typical index KEY (LAST_NAME) OK Use compound index SELECT * FROM EMPLOYEES WHERELAST_NAME=”Smith” AND DEPT=”Accounting” The index KEY(DEPT,LAST_NAME) will be used The compound index is more complicated Index (A,B,C) - field order problem In the following situations, the index will be used for query (full conditions) A>5 A=5 AND B>6 A=5 AND B=6 AND C=7 A=5 AND B IN (2,3) AND C>5 The following conditions will not use the index B>5 - The condition does not have A before the B field B=6 AND C=7 - The condition does not have the A before the B and C fields Use part of the index in the following cases A>5 AND B=2 - First A range query for field A resulted in only the part of field A in the index being used A=5 AND B>6 AND C=2 - A range query for field B resulted in only two fields A and B in the index being used Part of the field The first rule of the MySQL optimizer In a composite index, MySQL will stop using the remaining part (index) when it encounters a return query (,BETWEEN); but use IN (...) you can continue to the right using (more parts of) the index The index used to sort SELECT * FROM PLAYERS ORDER BY SCOREDESC LIMIT 10 will use the index KEY(SCORE) not Using an index will perform a very expensive "filesort" operation (externalsort) Combined indexes are often used for queries SELECT * FROM PLAYERS WHERE COUNTRY=“US” ORDER BY SCORE DESC LIMIT 10 The best choice is KEY(COUNTRY, SCORE) Efficiently sorted joint index becomes more limited! KEY(A,B) The following situations will use the index for sorting ORDER BY A - Sort the first field of the index A =5 ORDER BY B - perform a point query on the first field and sort on the second field ORDER BY A DESC, B DESC - sort both fields in the same order A>5 ORDER BY A - Perform range query on the first field and sort the first field In the following cases, the index will not be used for sorting ORDER BY B - Sort the second field (the first field is not used) A>5 ORDER BY B - Perform a range query on the first field and sort the second field A IN(1,2) ORDER BY B - Perform an IN query on the first field and sort the second field ORDER BY A ASC, B DESC - Sort two fields in different orders MySQL uses index sorting rules Cannot sort two fields in different orders Only dot queries (=) can be used for fields that are not part of the ORDER BY part – In this case, IN() won't work either Avoid reading the data (only read the index) "Covered index" – here refers to the index suitable for a specific query, not a type of index Read only Get the index instead of reading the data SELECT STATUS FROM ORDERS WHERECUSTOMER_ID=123 KEY(CUSTOMER_ID,STATUS) The index is usually smaller than the data itself (Index) is read in more order – Read the data Pointers are usually random Optimization of Min/Max Indices can help optimize statistical functions such as MIN()/MAX() – but only include the following: SELECT MAX(ID) FROM TBL; SELECT MAX(SALARY) FROM EMPLOYEEGROUP BY DEPT_ID will benefit from KEY(DEPT_ID,SALARY) "Using index for group-by" The use of indexes in joint table queries MySQL uses "Nested Loops" "Perform joint table query SELECT * FROM POSTS,COMMENTS WHEREAUTHOR="Peter" AND COMMENTS.POST_ID=POSTS.ID Scan table POSTS to query all posts with compound conditions Loop posts and find each post in table COMMENTS All comments It is very important to use an index for each related table (related field) The index is only necessary on the field being queried – the index of the POSTS.ID field is not used in this query. Redesign the union query of all indexes that doesn't work well Use multiple indexes MySQL can use more than 1 index "index merge" SELECT * FROM TBL WHERE A=5 AND B=6 – You can use index KEY(A) and KEY(B) respectively index KEY(A,B) is a better choice SELECT * FROM TBL WHERE A=5 OR B=6– The two indexes are used separately at the same time Index KEY(A,B) cannot be used in this query Prefix index You can create an index on the leftmost prefix of the field ALTER TABLE TITLE ADD KEY(TITLE(20)); Requires BLOB/ Indexing TEXT type fields can significantly reduce space usage cannot be used for covering indexes Choosing the prefix length becomes a problem Choosing the prefix length The prefix should have enough distinction Compare distinct prefix, distinct The value of the entire field mysql> select count(distinct(title)) total,count(distinct(left(title,10))) p10,count(distinct(left(title,20))) p20 from title; 1 row in set (44.19 sec) Check for outliers Make sure there are not many records using the same prefix Use the most Titlemysql> select count(*) cnt, title tl from title group by tl order by cnt desc limit 3; 3 rows in set (27.49 sec) The most used Title prefix mysql> select count(*) cnt, left(title,20) tl from title group by tl order by cnt desc limit 3; 3 rows in set (33.23 sec) How does MySQL choose which index to use? Dynamic selection for each query – constants in the query text are important Evaluate the number of rows to be queried for a given index, in the table Perform "dive" in If (dive) is not feasible, use "Cardinality" for statistics - this is updated when doing ANALYZE TABLE More options for indexes Not just minimizing the number of scanned rows Many other heuristics (attempts) and hacks – Primary keys are very important for Innodb Covered index benefits Full table scan is faster, all being equal (this sentence is not very clear) We can also use indexes Sorting Instructions Verify the execution plan actually used by MYSQL Note that it can be changed dynamically based on constants and data Use EXPLAIN EXPLAIN is a good tool to see how MYSQL will query mysql> ; explain select max(season_nr) from title group by production_year; http://dev.mysql.com/doc/refm... Remember, the actual query may be different from the execution plan 1 row in set (0.01 sec) MySQL Explain 101 "type" sorted from best to worst: - system, const, eq_ref, ref, range, index, ALL Note "rows" - larger numerical values mean For slower queries Check "key_len" - shows which parts of the index are actually used Look out for "Extra" Using Index - Good Using Filesort, Using Temporary - Bad Index strategy for your Index key performance query sets – look at them as a whole instead of looking at them one by one It is best to use indexes for all query conditions and join table conditions – at least the most differentiated part is Generally speaking, if possible, expand Index instead of creating a new index Remember to verify the impact on performance when modifying Example of index strategy Build indexes in an order that can support more queries SELECT * FROM TBL WHERE A=5 AND B=6 SELECT * FROM TBL WHERE A>5 AND B=6– For two queries, KEY(B,A) is a better choice Put all the fields that are point queries at the first place of the index Don’t add Indexes for non-performance critical queries – too many indexes will slow down MYSQL Trick #1: Enumeration range KEY (A,B) SELECT * FROM TBL WHERE A BETWEEN 2AND 4 AND B=5 Will only use the first field part of the index SELECT * FROM TBL WHERE A IN (2,3,4) ANDB=5 Use both field parts of the index Trick #2: Add a false condition KEY (GENDER,CITY) SELECT * FROM PEOPLE WHERE CITY="NEWYORK" No need for index at all SELECT * FROM PEOPLE WHERE GENDER IN("M","F") AND CITY="NEW" YORK" will use index This Trick can be used well on low-distinguished fields Gender, Status, Boolean Types etc Trick #3: Virtual and Real Filesort KEY(A,B) SELECT * FROM TBL WHERE A IN (1,2) ORDER BYB LIMIT 5; Unable to use index for sorting (SELECT FROM TBL WHERE A=1 ORDER BY B LIMIT 5) UNION ALL (SELECT FROM TBL WHERE A= 2 ORDER BY B LIMIT 5) ORDER BY B LIMIT 5; will use indexes, and "filesort" is only used for records of no more than 10 lines