Home > Database > Mysql Tutorial > How Can I Optimize GROUP BY Queries to Efficiently Retrieve the Latest Row for Each User?

How Can I Optimize GROUP BY Queries to Efficiently Retrieve the Latest Row for Each User?

DDD
Release: 2025-01-24 00:37:14
Original
232 people have browsed it

How Can I Optimize GROUP BY Queries to Efficiently Retrieve the Latest Row for Each User?

Optimization Strategies for GROUP BY Query to Retrieve Latest Row per User

Given a table with user messages structured as log_date, user_id, and payload, the task is to efficiently retrieve the latest record for each user before a specific date.

Multicolumn Index

To improve read performance, create a multicolumn index on user_id and log_date:

CREATE INDEX log_combo_idx ON log (user_id, log_date DESC NULLS LAST);
Copy after login

Index-Only Scans with Covering Index

For index-only scans, define a covering index that includes the payload column:

CREATE INDEX log_combo_covering_idx ON log (user_id, log_date DESC NULLS LAST) INCLUDE (payload);
Copy after login

SELECT DISTINCT ON()

For small tables or few rows per user_id, using SELECT DISTINCT ON() can be efficient:

SELECT DISTINCT ON(user_id) log_date, payload
FROM log
WHERE log_date <= :mydate
ORDER BY user_id, log_date DESC;
Copy after login

Index Skip Scan Emulation

For large tables with many rows per user_id, consider emulating an index skip scan using a recursive CTE with LATERAL join:

WITH RECURSIVE cte AS (
   (
   SELECT user_id, log_date, payload
   FROM   log
   WHERE  log_date <= :mydate
   ORDER  BY user_id, log_date DESC NULLS LAST
   LIMIT  1
   )
   UNION ALL
   SELECT l.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT l.user_id, l.log_date, l.payload
      FROM   log l
      WHERE  l.user_id > c.user_id  -- lateral reference
      AND    log_date <= :mydate    -- repeat condition
      ORDER  BY l.user_id, l.log_date DESC NULLS LAST
      LIMIT  1
      ) l
   )
TABLE  cte
ORDER  BY user_id;
Copy after login

Separate Users Table

If a separate users table exists, simplified solutions are possible:

LATERAL Join

SELECT u.user_id, l.log_date, l.payload
FROM   users u
CROSS  JOIN LATERAL (
   SELECT l.log_date, l.payload
   FROM   log l
   WHERE  l.user_id = u.user_id         -- lateral reference
   AND    l.log_date <= :mydate
   ORDER  BY l.log_date DESC NULLS LAST
   LIMIT  1
   ) l;
Copy after login

Correlated Subquery

SELECT user_id, (combo1).*              -- note parentheses
FROM (
   SELECT u.user_id
        , (SELECT (l.log_date, l.payload)::combo
           FROM   log l
           WHERE  l.user_id = u.user_id
           AND    l.log_date <= :mydate
           ORDER  BY l.log_date DESC NULLS LAST
           LIMIT  1) AS combo1
   FROM   users u
   ) sub;
Copy after login

These optimizations improve query performance by utilizing indexes, emulating skip scans, and taking advantage of a separate table for user information.

The above is the detailed content of How Can I Optimize GROUP BY Queries to Efficiently Retrieve the Latest Row for Each User?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template