When processing grouped data, you usually need to extract the first row in each grouping. This task can be accomplished using the DISTINCT ON
clause, a PostgreSQL-specific extension.
<code class="language-sql">SELECT DISTINCT ON (grouping_expression) selected_columns FROM table_name ORDER BY grouping_expression, additional_ordering_expression(s)</code>
grouping_expression: Expression used to define grouping. The leading expression in this clause must match the expression in the following ORDER BY
clause.
selected_columns: The columns to retrieve in the output.
additional_ordering_expression(s): Optional expression used to determine the order of rows within each grouping. These expressions must follow the grouping expression in the ORDER BY
clause.
Consider the purchases
table with the following schema:
<code class="language-sql">CREATE TABLE purchases ( id SERIAL PRIMARY KEY, customer VARCHAR(50) NOT NULL, total NUMERIC(10, 2) NOT NULL );</code>
and the following data:
<code class="language-sql">INSERT INTO purchases (customer, total) VALUES ('Joe', 5), ('Sally', 3), ('Joe', 2), ('Sally', 1);</code>
To retrieve the customer with the highest total purchase amount, use the following query:
<code class="language-sql">SELECT DISTINCT ON (customer) id, customer, total FROM purchases ORDER BY customer, total DESC, id;</code>
Output:
<code>id | customer | total ----------------------------- 1 | Joe | 5 2 | Sally | 3</code>
If the grouped value may contain NULL values, it is recommended to add ORDER BY
to the NULLS LAST
clause to ensure that rows with non-NULL values are processed first:
<code class="language-sql">ORDER BY customer, total DESC NULLS LAST, id;</code>
SELECT
Lists are not limited to grouping expressions. Any additional columns can be included, making this technique useful for replacing complex subqueries or aggregate/window functions.
DISTINCT ON
The efficiency of the query depends on the distribution of data and the number of rows in each grouping. It is usually very efficient for small groupings. However, for large groupings, the alternative technique described in the following link may provide better performance:
The above is the detailed content of How to Find the First Row in Each Group Using PostgreSQL's DISTINCT ON Clause?. For more information, please follow other related articles on the PHP Chinese website!