Aggregate a Single Column in Query with Multi-Column Table
In a situation where you have a query with multiple columns and desire to aggregate a single specific column, it can present a challenge. Consider the following query:
SELECT t1.foo1, t1.foo2, t2.foo3, t2.foo4, string_agg(t3.aggregated_field, ', ') FROM tbl1 t1 LEFT JOIN tbl2 t2 ON t1.id = t2.fkeyid LEFT JOIN tbl3 t3 ON t2.id = t3.fkeyid GROUP BY t1.foo1, t1.foo2, t2.foo3, t2.foo4, t2.foo5, t2.foo6 ORDER BY t2.foo5, t2.foo6
While the query works, it becomes verbose due to the need to list all non-aggregated fields in both the GROUP BY and ORDER BY clauses. This constraint arises because aggregates are not allowed in these clauses.
Fortunately, for PostgreSQL 9.1 and later, a simpler solution exists. By leveraging the concept of grouping by primary keys, the query can be dramatically simplified:
SELECT foo1, foo2, foo3, foo4, foo5, foo6, string_agg(aggregated_field, ', ') FROM tbl1 GROUP BY 1 ORDER BY foo7, foo8; -- had to spell out, since no longer in select list!
However, if the query involves multiple tables with complex relationships, it may be more efficient to adopt an alternative approach. By performing aggregation first and then joining the results, the non-aggregated portion of the query can be optimized:
SELECT t1.foo1, t1.foo2, ... , t2.bar1, t2.bar2, ... , a.aggregated_col FROM tbl1 t1 LEFT JOIN tbl2 t2 ON ... ... LEFT JOIN ( SELECT some_id, string_agg(agg_col, ', ') AS aggregated_col FROM agg_tbl a ON ... GROUP BY some_id ) a ON a.some_id = ?.some_id ORDER BY ...
In this scenario, the majority of the query can proceed without aggregation, leading to improved performance.
The above is the detailed content of How Can I Efficiently Aggregate a Single Column in a Multi-Column SQL Query?. For more information, please follow other related articles on the PHP Chinese website!