First, let’s look at a question. For table t, it contains three fields a, b, c. Assuming that their default values are not empty, now create a combined index index (a, b, c) analysis What is the difference between select * from t where a=1 and c=1 and select * from t where a=1 and b=1?
First create the table
##Execute these two items respectively Statement
It is found that the difference between the two is mainly in key_len. Why are the differences between the two different?
My understanding is:
We can think of the combined index as the first-level directory, the second-level directory, and the third-level directory of the book, such as index(a,b,c) is equivalent to a being a first-level directory, b being a second-level directory under the first-level directory, and c being a third-level directory under the second-level directory. To use a directory, you must first use its superior directory, except for the first-level directory.
So
where a=1 and c=1 only uses the first-level directory, c is in the third-level directory, no If you use the second-level directory, then the third-level directory cannot be used
where a=1 and b=1 only use A first-level directory and a second-level directory.
So the key_len of the second query is larger.
#But how is key_len calculated? How do we calculate 4 and 8 above? I haven’t paid much attention to it before. When analyzing the performance of SQL query statements through explain, I paid more attention to select_type, type, possible_key, key, ref, rows, and extra this time. I feel it is necessary to clarify the calculation of key_len.
1. For all index fields, if not null is not set, one byte needs to be added.
#2. Fixed-length field, int occupies four bytes, date occupies three bytes, and char(n) occupies n characters.
#3. For the field varchar(n), there are n characters + two bytes. 4. Different character sets, the number of bytes occupied by a character is different. In latin1 encoding, one character occupies one byte, in gbk encoding, one character occupies two bytes, and in utf8 encoding, one character occupies three bytes.
can be derived
where a=1 and c= For 1, key_len=4
where a=1 and c=1, key_len=4+4=8
# Now let’s do another question, Create a t2 table, the data structure is as follows
##Please execute explain select * from t2 where name="001 " and id=1 \G; What is the key_len?
Analysis key_len=4+5*1+2=11, because the fields are not null, the int type is 4 bytes, and varchar(5) occupies 5 characters + 2 Bytes, one character of latin1 encoding table occupies 1 byte, so varchar(5) occupies 7 bytes. The structure is as shown below
Because MySQL has a query optimizer, for where a=1 and c=1 type queries, fields The order has no effect, the query optimizer will optimize automatically. where c=1 and a=1 will be optimized to where a=1 and c=1, but it is recommended to use where
a=1 and c=1, for easy understanding and query buffering. Because query buffering and hashkey values are calculated based on SQL statements and are case-sensitive, when writing SQL statements, try to keep them consistent to prevent the same query from being cached multiple times. ##Supplement