Optimization Strategies for GROUP BY Query to Retrieve Latest Row per User
Given a table with user messages structured as log_date, user_id, and payload, the task is to efficiently retrieve the latest record for each user before a specific date.
Multicolumn Index
To improve read performance, create a multicolumn index on user_id and log_date:
CREATE INDEX log_combo_idx ON log (user_id, log_date DESC NULLS LAST);
Index-Only Scans with Covering Index
For index-only scans, define a covering index that includes the payload column:
CREATE INDEX log_combo_covering_idx ON log (user_id, log_date DESC NULLS LAST) INCLUDE (payload);
SELECT DISTINCT ON()
For small tables or few rows per user_id, using SELECT DISTINCT ON() can be efficient:
SELECT DISTINCT ON(user_id) log_date, payload FROM log WHERE log_date <= :mydate ORDER BY user_id, log_date DESC;
Index Skip Scan Emulation
For large tables with many rows per user_id, consider emulating an index skip scan using a recursive CTE with LATERAL join:
WITH RECURSIVE cte AS ( ( SELECT user_id, log_date, payload FROM log WHERE log_date <= :mydate ORDER BY user_id, log_date DESC NULLS LAST LIMIT 1 ) UNION ALL SELECT l.* FROM cte c CROSS JOIN LATERAL ( SELECT l.user_id, l.log_date, l.payload FROM log l WHERE l.user_id > c.user_id -- lateral reference AND log_date <= :mydate -- repeat condition ORDER BY l.user_id, l.log_date DESC NULLS LAST LIMIT 1 ) l ) TABLE cte ORDER BY user_id;
Separate Users Table
If a separate users table exists, simplified solutions are possible:
LATERAL Join
SELECT u.user_id, l.log_date, l.payload FROM users u CROSS JOIN LATERAL ( SELECT l.log_date, l.payload FROM log l WHERE l.user_id = u.user_id -- lateral reference AND l.log_date <= :mydate ORDER BY l.log_date DESC NULLS LAST LIMIT 1 ) l;
Correlated Subquery
SELECT user_id, (combo1).* -- note parentheses FROM ( SELECT u.user_id , (SELECT (l.log_date, l.payload)::combo FROM log l WHERE l.user_id = u.user_id AND l.log_date <= :mydate ORDER BY l.log_date DESC NULLS LAST LIMIT 1) AS combo1 FROM users u ) sub;
These optimizations improve query performance by utilizing indexes, emulating skip scans, and taking advantage of a separate table for user information.
The above is the detailed content of How Can I Optimize GROUP BY Queries to Efficiently Retrieve the Latest Row for Each User?. For more information, please follow other related articles on the PHP Chinese website!