Applications often necessitate extracting insights from event-based data, such as counting events within specific time intervals. This task poses a challenge when the interval size varies dynamically.
A data table with timestamped events requires a dynamic SQL query that groups events by arbitrary time intervals, including weekly, daily, hourly, or even 15-minute intervals.
For Postgres versions 14 and以降, the date_bin() function provides a straightforward solution. To retrieve one row per time slot with data, use the following query:
SELECT date_bin('15 min', e.ts, '2018-05-01') AS start_time , count(e.ts) AS events FROM event e GROUP BY 1 ORDER BY 1;
To get all time slots since a specific date, adjust the query as follows:
SELECT start_time, COALESCE(events, 0) AS events FROM ( SELECT generate_series(timestamp '2018-05-01', max(ts), interval '15 min') FROM event ) g(start_time) LEFT JOIN ( SELECT date_bin('15 min', e.ts, '2018-05-01'), count(e.ts) FROM event e WHERE e.ts >= '2018-05-01' -- filter early (optional) GROUP BY 1 ) e(start_time, events) USING (start_time) ORDER BY 1;
For older Postgres versions, the following query can be used:
WITH grid AS ( SELECT start_time , lead(start_time, 1, 'infinity') OVER (ORDER BY start_time) AS end_time FROM ( SELECT generate_series(min(ts), max(ts), interval '17 min') AS start_time FROM event ) sub ) SELECT start_time, count(e.ts) AS events FROM grid g LEFT JOIN event e ON e.ts >= g.start_time AND e.ts < g.end_time GROUP BY start_time ORDER BY start_time;
This query provides one row for each time slot, with a count of 0 for slots without events.
Pre-storing interval data in the event table can improve performance. However, this approach doubles the table size. If the trade-off is acceptable, it can provide a significant speed boost.
The above is the detailed content of How to Efficiently Count Events within Dynamic Time Intervals in PostgreSQL?. For more information, please follow other related articles on the PHP Chinese website!