SQL select only rows with maximum value on column
P粉937382230
2023-08-23 11:08:57
<p>I have this document form (here is a simplified version): </p>
<table class="s-table">
<thead>
<tr>
<th>id</th>
<th>Transfer</th>
<th>Content</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>...</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>...</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>...</td>
</tr>
<tr>
<td>1</td>
<td>3</td>
<td>...</td>
</tr>
</tbody>
</table>
<p>How to select one row per ID and only the largest RPM? </p><p>
Using the above data, the result should contain two rows: <code>[1, 3, ...]</code> and <code>[2, 1, ..]</code>. I'm using<em><strong>MySQL</strong></em>. </p>
<p>Currently, I'm using a check in a <code>while</code> loop to detect and overwrite old revs in the result set. But is this the only way to achieve this result? Don't have a <strong>SQL</strong> solution? </p>
My preference is to use as little code as possible...
You can do this using
IN
Try this:In my opinion, it's less complex... easier to read and maintain.
At first glance...
All you need is a
GROUP BY
clause with aMAX
aggregate function:Things are never that simple, are they?
I just noticed that you also need the
content
column.This is a very common problem in SQL: find all the data for a row with some maximum value in each group identifier column. I hear this statement often in my career. In fact, this was one of the questions I answered in a technical interview at my current job.
In fact, the Stack Overflow community created a tag to address questions like this: greatest-n-per-group.
Basically, you have two ways to solve this problem:
Use simple
group-identifier, max-value-in-group
Subquery joinIn this approach you first find the
group-identifier, max-value-in-group
(solved above) in the subquery. Then, join the table to the subquery and make it equal ongroup-identifier
andmax-value-in-group
:Left join with self, adjust connection conditions and filters
In this approach you left join the table to itself. Equality exists in
group-identifier
. Then, 2 smart moves:NULL
on the right (that's aLEFT JOIN
, remember?). We then filter the join results to only show rows with NULL on the right side.So you end up with:
in conclusion
Both methods will bring exactly the same results.
If you have two rows
group-identifier
withmax-value-in-group
, both rows will appear in the results of both methods.Both methods are SQL ANSI compatible, so they will work with your favorite RDBMS regardless of their "flavor".
Both approaches are also performance-friendly, but your situation may differ (RDBMS, database structure, indexes, etc.). So baselines when you choose one method over another. And make sure you choose the one that makes the most sense to you.