Why is SELECT DISTINCT Slow on a Table with a Composite Primary Key in PostgreSQL, and How Can It Be Optimized?-Mysql Tutorial-php.cn

Why is SELECT DISTINCT Slow on a Table with a Composite Primary Key in PostgreSQL, and How Can It Be Optimized?

Patricia Arquette

Release： 2025-01-07 18:27:40

Original

749 people have browsed it

On tables using composite primary keys in PostgreSQL SELECT DISTINCT Reasons for slow query speed and optimization methods

Why is SELECT DISTINCT Slow on a Table with a Composite Primary Key in PostgreSQL, and How Can It Be Optimized?

In a PostgreSQL database, SELECT DISTINCT the execution speed of a query depends on the table structure and data distribution. Although the tickers column in the product_id table is part of a composite primary key and is therefore indexed on it, a query that uses SELECT DISTINCT product_id FROM tickers to get unique product_id performs a sequential scan by default.

Reasons for slow performance

The main reason for the slow performance of

is that there are duplicate values of product_id in the table. This means that for each unique product_id retrieved, PostgreSQL must scan the entire table to ensure that there are no duplicates.

Solution: simulate index skip scan

Since PostgreSQL does not yet natively support index skip scans, you can use recursive CTEs (common table expressions) to simulate this behavior. This CTE iteratively retrieves and discards duplicates, effectively filtering out all but one instance of each unique product_id.

Improved solution

<code class="language-sql">WITH RECURSIVE cte AS (
   (   -- 括号必需
   SELECT product_id
   FROM   tickers
   ORDER  BY 1
   LIMIT  1
   )
   UNION ALL
   SELECT l.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT product_id
      FROM   tickers t
      WHERE  t.product_id > c.product_id  -- 横向引用
      ORDER  BY 1
      LIMIT  1
      ) l
   )
TABLE  cte;</code>

Copy after login

This query uses a horizontal join to traverse the sorted table and retrieve unique orderBy values using product_id .

Conclusion

The execution time of SELECT DISTINCT product_id queries can be significantly improved by simulating an index skip scan using the CTE method, thereby reducing the time required to retrieve unique tickerss from the product_id table.

The above is the detailed content of Why is SELECT DISTINCT Slow on a Table with a Composite Primary Key in PostgreSQL, and How Can It Be Optimized?. For more information, please follow other related articles on the PHP Chinese website!