PostgreSQL SELECT DISTINCT
Performance Bottleneck: A Case Study
This article examines performance issues encountered with a SELECT DISTINCT
query on a PostgreSQL table containing approximately two million records.
Context
The tickers
table stores data from Coinbase Pro's "ticker" channel. A composite primary key includes the product_id
column.
Performance Problem
The query SELECT DISTINCT product_id FROM tickers
was anticipated to perform well due to the index on product_id
. However, execution consistently took 500-600 milliseconds.
Query Plan Investigation
EXPLAIN ANALYZE
showed the query planner defaulted to a sequential scan, ignoring the product_id
index. Forcing index use didn't resolve the performance problem.
Index Optimization Attempts
Creating a dedicated index on product_id
yielded only minor improvements, with sequential scans still preferred by the planner unless explicitly overridden.
Effective Solution: Index Skip Scan Emulation
The solution implemented emulates index skip scans using recursive queries with lateral joins. This approach significantly improved performance, reducing execution time to 0.75 milliseconds even with a dataset of 2.25 million rows.
Summary
PostgreSQL's current lack of native index skip scan capabilities is addressed by this emulation technique. This method effectively utilizes existing indexes and avoids the performance penalty of sequential scans for SELECT DISTINCT
queries on large tables.
The above is the detailed content of Why is My PostgreSQL `SELECT DISTINCT` So Slow, and How Can I Fix It?. For more information, please follow other related articles on the PHP Chinese website!