There is a tablemessages
, the data contained in it is as follows:
Id Name Other_Columns ------------------------- 1 A A_data_1 2 A A_data_2 3 A A_data_3 4 B B_data_1 5 B B_data_2 6 C C_data_1
If I run the query select * from messages group by name
, the results I will get are:
1 A A_data_1 4 B B_data_1 6 C C_data_1
What query would return the following results?
3 A A_data_3 5 B B_data_2 6 C C_data_1
That is, the last record in each group should be returned.
Currently, this is the query I use:
SELECT * FROM (SELECT * FROM messages ORDER BY id DESC) AS x GROUP BY name
But this seems very inefficient. Are there any other ways to achieve the same result?
UPD: 2017-03-31, Version 5.7.5 MySQL enables the ONLY_FULL_GROUP_BY switch by default (so non-deterministic GROUP BY queries are disabled). Additionally, they updated the GROUP BY implementation and the solution may not work as expected even with the switch disabled. Need to check it out.
Bill Karwin's above solution works fine when the number of items within the group is fairly small, but when the group is fairly large the performance of the query becomes poor as the solution requires approximately n*n/2 n/ 2 Only compare
IS NULL
.I tested on an InnoDB table containing
18684446
rows and1182
groups. This table contains test results for functional tests and has(test_id, request_id)
as the primary key. So,test_id
is a group and I am searching for the lastrequest_id
for eachtest_id
.Bill's solution has been running on my Dell e4310 for a few hours now, and although it's running on a covered index (so using the index in EXPLAIN), I don't know when it will be complete.
I have a couple of other solutions based on the same idea:
(group_id, item_value)
pair is the last value in eachgroup_id
, i.e. if We traverse the index in descending order, which is the first one of eachgroup_id
;3 Ways MySQL Uses Indexes is a great article to help you understand some of the details.
Solution 1
This is incredibly fast, taking about 0.8 seconds on my 18M rows:
If you want to change the order to ASC, put it in a subquery that returns only the ids and use it as a subquery to join the rest of the columns:
This takes about 1.2 seconds for my data.
Solution 2
Here's another solution that took about 19 seconds for my table:
It also returns tests in descending order. It's much slower because it performs a full index scan, but it gives you an idea of how to output the N maximum rows for each group.
The disadvantage of this query is that the query cache cannot cache its results.
MySQL 8.0 now supports window functions, such as almost all popular SQL implementations. Using this standard syntax, we can write up to n queries per group:
This method and other methods of finding the maximum number of rows grouped are described in the MySQL manual.
The following is the original answer I wrote to this question in 2009:
I wrote the solution like this: