I have a table of documents (here is a simplified version):
id | change | content |
---|---|---|
1 | 1 | ... |
2 | 1 | ... |
1 | 2 | ... |
1 | 3 | ... |
How to select one row for each id and only select the largest rev?
Based on the above data, the result should contain two rows: [1, 3, ...]
and [2, 1, ..]
. I'm using MySQL.
Currently, I'm using a check in a while
loop to detect and overwrite the old rev in the result set. But is this the only way to achieve results? Is there no SQL solution?
I prefer to use as little code as possible...
You can use
IN
to achieve Try this:In my opinion, this is simpler... easier to read and maintain.
At first glance...
You only need to use the
MAX
aggregate function in theGROUP BY
clause:Things are never that simple, right?
I just noticed that you also need the
content
column.In SQL, this is a very common problem: find the entire row of data with the maximum value in a certain column based on a certain grouping identifier. I've heard this question a lot in my career. In fact, this is a question I answered during a technical interview for my current job.
This question is actually so common that the Stack Overflow community created a tag specifically to deal with this type of question: greatest-n-per-group.
Basically, you have two ways to solve this problem:
Use simple
group-identifier, max-value-in-group
Subquery to connectIn this approach, you first find the
group-identifier, max-value-in-group
(already solved above) in a subquery. You then join your table with the subquery, usinggroup-identifier
andmax-value-in-group
for an equijoin:Use self-join for left join, and adjust the connection conditions and filtering conditions
In this approach, you left join the table to itself. Equivalent connections are placed in
group-identifier
. Then, there are two clever steps:NULL
on the right (remember this is aLEFT JOIN
). We then filter the results of the join to only show rows withNULL
on the right.So, you end up with:
in conclusion
Both methods will give exactly the same results.
If there are two rows with
max-value-in-group
ingroup-identifier
, then these two rows will appear in the result in both methods.Both methods are SQL ANSI compatible, so no matter what "flavor" of RDBMS you prefer, you can use it.
Both methods are also very friendly in terms of performance, but your actual situation may be different (RDBMS, database structure, index, etc.). Therefore, benchmark when choosing a method. Make sure to choose the method that makes the most sense for you.