On tables using composite primary keys in PostgreSQL SELECT DISTINCT
Reasons for slow query speed and optimization methods
In a PostgreSQL database, SELECT DISTINCT
the execution speed of a query depends on the table structure and data distribution. Although the tickers
column in the product_id
table is part of a composite primary key and is therefore indexed on it, a query that uses SELECT DISTINCT product_id FROM tickers
to get unique product_id
performs a sequential scan by default.
Reasons for slow performance
The main reason for the slow performance of is that there are duplicate values of product_id
in the table. This means that for each unique product_id
retrieved, PostgreSQL must scan the entire table to ensure that there are no duplicates.
Solution: simulate index skip scan
Since PostgreSQL does not yet natively support index skip scans, you can use recursive CTEs (common table expressions) to simulate this behavior. This CTE iteratively retrieves and discards duplicates, effectively filtering out all but one instance of each unique product_id
.
Improved solution
<code class="language-sql">WITH RECURSIVE cte AS ( ( -- 括号必需 SELECT product_id FROM tickers ORDER BY 1 LIMIT 1 ) UNION ALL SELECT l.* FROM cte c CROSS JOIN LATERAL ( SELECT product_id FROM tickers t WHERE t.product_id > c.product_id -- 横向引用 ORDER BY 1 LIMIT 1 ) l ) TABLE cte;</code>
This query uses a horizontal join to traverse the sorted table and retrieve unique orderBy
values using product_id
.
Conclusion
The execution time of SELECT DISTINCT product_id
queries can be significantly improved by simulating an index skip scan using the CTE method, thereby reducing the time required to retrieve unique tickers
s from the product_id
table.
The above is the detailed content of Why is SELECT DISTINCT Slow on a Table with a Composite Primary Key in PostgreSQL, and How Can It Be Optimized?. For more information, please follow other related articles on the PHP Chinese website!