Retrieving Top N Records per Group in SQL
This guide demonstrates how to efficiently retrieve the top N records for each group in a SQL table, ordered by age (descending) and then alphabetically by person's name to handle ties.
Scenario:
Imagine a table containing information about people, their group affiliation, and their age:
Person | Group | Age |
---|---|---|
Bob | 1 | 32 |
Jill | 1 | 34 |
Shawn | 1 | 42 |
Jake | 2 | 29 |
Paul | 2 | 36 |
Laura | 2 | 39 |
The goal is to retrieve the top 2 (N=2) records for each group, prioritizing older individuals and resolving ties using alphabetical order. The desired output is:
Person | Group | Age |
---|---|---|
Shawn | 1 | 42 |
Jill | 1 | 34 |
Laura | 2 | 39 |
Paul | 2 | 36 |
Solutions:
Two common approaches are presented: UNION ALL
and the ROW_NUMBER()
window function.
Method 1: Using UNION ALL (Less Efficient for Large Datasets)
This method is straightforward but can become inefficient for tables with many groups and a large number of records per group. It involves creating a separate query for each group and combining the results using UNION ALL
. This example retrieves the top 2 records (N=2) for each group:
<code class="language-sql">( SELECT * FROM mytable WHERE `group` = 1 ORDER BY age DESC, person LIMIT 2 ) UNION ALL ( SELECT * FROM mytable WHERE `group` = 2 ORDER BY age DESC, person LIMIT 2 )</code>
This approach requires modification for each additional group. It's not scalable for a large number of groups.
Method 2: Using ROW_NUMBER() (More Efficient)
The ROW_NUMBER()
window function provides a more efficient and scalable solution. It assigns a unique rank to each row within each group based on the specified ordering.
<code class="language-sql">SELECT person, `group`, age FROM ( SELECT person, `group`, age, ROW_NUMBER() OVER (PARTITION BY `group` ORDER BY age DESC, person) as rn FROM mytable ) as ranked_data WHERE rn <= 2;</code>
This query first assigns a rank (rn
) to each row within each group, ordering by age (descending) and then by person's name. The outer query then filters the results to include only those rows where the rank is less than or equal to 2 (N=2). This approach is far more efficient and easily adaptable to different values of N and a variable number of groups.
Further Reading:
For a comprehensive exploration of techniques for selecting top N records per group, refer to this resource: https://www.php.cn/link/131632cb7eeb986974e1be59af67e8fe
The above is the detailed content of How to Retrieve the Top N Records for Each Group in SQL?. For more information, please follow other related articles on the PHP Chinese website!