Many real-world applications require determining the count of preceding records within a specific time range. This article explores different approaches to achieving this in PostgreSQL, especially when dealing with large datasets and dynamic time ranges.
In Postgres 11 and later, the RANGE framing option of window functions enables a straightforward solution:
SELECT id, ts, COUNT(*) OVER (ORDER BY ts RANGE '1 hour' PRECEDING EXCLUDE CURRENT ROW) AS ct FROM test ORDER BY ts;
Despite its performance limitations, Roman's CTE-based solution remains an option:
SELECT id, ts, (SELECT COUNT(*)::int - 1 FROM unnest(dates) x WHERE x >= sub.ts - interval '1h') AS ct FROM ( SELECT id, ts, array_agg(ts) OVER (ORDER BY ts) AS dates FROM test ) sub;
The correlated subquery method offers superior performance:
SELECT id, ts, (SELECT COUNT(*) FROM test t1 WHERE t1.ts >= t.ts - interval '1h' AND t1.ts < t.ts) AS ct FROM test t ORDER BY ts;
For optimal performance, especially in scenarios with dynamic time ranges, a PL/pgSQL function combined with a cursor can be employed:
CREATE FUNCTION running_window_ct(_intv interval = '1 hour') RETURNS TABLE (id bigint, ts timestamp, ct int) LANGUAGE plpgsql AS $func$ ..... $func$;
SELECT * FROM running_window_ct();
Benchmarking these approaches on a dataset of 100,000 rows demonstrates the superiority of the PL/pgSQL function for scalability and performance:
100 rows: ROM: 27.656 ms ARR: 7.834 ms COR: 5.488 ms FNC: 1.115 ms 1000 rows: ROM: 2116.029 ms ARR: 189.679 ms COR: 65.802 ms FNC: 8.466 ms 100000 rows: ROM: DNF ARR: DNF COR: 6760 ms FNC: 828 ms
The above is the detailed content of How Can I Efficiently Count Previous Rows Within a Time Range in PostgreSQL?. For more information, please follow other related articles on the PHP Chinese website!