SQL select only rows with maximum value on column

Question

I have this document table (here is a simplified version): id to content 11...21...12...13... How to select one row for each ID and only the largest tach? Using the above data, the result should contain two rows: [1,3,...] and [2,1,..]. I'm using MySQL. Currently I'm using a check in a while loop to detect and overwrite old revs in the result set. But is this the only way to achieve this result? No SQL solution?

P粉638343995 · Answer

My preference is to use as little code as possible...

You can do this using IN Try this:

SELECT * 
FROM t1 WHERE (id,rev) IN 
( SELECT id, MAX(rev)
  FROM t1
  GROUP BY id
)

In my opinion, it's less complex... easier to read and maintain.

P粉517475670 · Answer

At first glance...

All you need is a GROUP BY clause with a MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

Things are never that simple, are they?

I just noticed that you also need the content column.

This is a very common problem in SQL: find all the data for a row with some maximum value in each group identifier column. I hear this statement often in my career. In fact, this was one of the questions I answered in a technical interview at my current job.

In fact, the Stack Overflow community created a tag to address questions like this: greatest-n-per-group.

Basically, you have two ways to solve this problem:

Use simple `group-identifier, max-value-in-group`Subquery join

In this approach you first find the group-identifier, max-value-in-group (solved above) in the subquery. Then, join the table to the subquery and make it equal on group-identifier and max-value-in-group:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Left join with self, adjust connection conditions and filters

In this approach you left join the table to itself. Equality exists in group-identifier. Then, 2 smart moves:

The second connection condition is that the value on the left is less than the value on the right
When you do step 1, the row that actually has the largest value will show NULL on the right (that's a LEFT JOIN, remember?). We then filter the join results to only show rows with NULL on the right side.

So you end up with:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

in conclusion

Both methods will bring exactly the same results.

If you have two rows group-identifier with max-value-in-group, both rows will appear in the results of both methods.

Both methods are SQL ANSI compatible, so they will work with your favorite RDBMS regardless of their "flavor".

Both approaches are also performance-friendly, but your situation may differ (RDBMS, database structure, indexes, etc.). So baselines when you choose one method over another. And make sure you choose the one that makes the most sense to you.

At first glance...

Things are never that simple, are they?

Use simple group-identifier, max-value-in-groupSubquery join

Left join with self, adjust connection conditions and filters

in conclusion

Use simple `group-identifier, max-value-in-group`Subquery join