I know the importance of indexes and how join order changes performance. I've read a lot related to multi-column indexes but didn't find an answer to my question.
I'm curious if the order in which they are specified matters if I do a multi-column index. My guess is no, and the engine will treat them as a group where the order doesn't matter. But I want to verify it.
For example, from the mysql website (http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html)
CREATE TABLE test ( id INT NOT NULL, last_name CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (last_name,first_name) );
In any case, would the following be better or equivalent, and would it be of any benefit?
CREATE TABLE test ( id INT NOT NULL, last_name CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (first_name,last_name) );
Specifically:
INDEX name (last_name,first_name)
Compared
INDEX name (first_name,last_name)
The two indexes are different. This is true in MySQL and other databases. MySQL explains the difference in the documentation.
Consider two indexes:
Both should apply equally to:
idx_lf is optimal for:
idx_fl will be best suited for the following situations:
For many such cases, it is possible to use two indexes, but one of them is optimal. For example, consider using the idx_lf query:
MySQL can use idx_lf to read the entire table and then filter after
order by
. I don't think this is an optimization option in practice (for MySQL), but it may happen in other databases.When discussing multi-column indexes, I use the analogy of a phone book. A phone book is basically an index with last name first and then first name. So the sort order is determined by which "column" comes first. Searches are divided into the following categories:
If you are looking for someone with the last name Smith, you can easily find them because the book is sorted by last name.
If you are looking for someone named John, the phone book is not helpful because John is scattered throughout the book. You have to scan the entire phone book to find them.
If you are looking for someone with a specific last name Smith and a specific first name John, this book will help because you will find that the Smiths are sorted together, and within the Smiths, the Johns are also sorted turn up.
If you had a phone book sorted by first name and then by last name, the sorting of the phone book would help you in cases #2 and #3 above, but not in case #1 provide help.
This explains the case of finding an exact value, but what if you're looking by a range of values? Suppose you want to find all people whose first name is John and whose last name begins with "S" (Smith, Saunders, Staunton, Sherman, etc.). Johns are sorted by "J" within each last name, but if you want all Johns with all last names starting with "S", the Johns are not grouped together. They're spread out again, so you end up having to scan for all the names whose last name starts with "S." However, if the phone book was organized by first name and then by last name, you would find that all Johns would be grouped together, and then within John, all the "S" surnames would be grouped together.
Therefore, the order of columns in a multi-column index absolutely matters. A type of query may require a specific column order of the index. If you have multiple types of queries, you may need multiple indexes to help them, with the columns in different order.
You can read my presentation How to Really Design an Index to learn more, or watch the video.