SQL: Eliminating Duplicate Rows in Tables
Challenge:
You're faced with a table lacking primary keys or constraints, containing multiple rows with identical data. The goal is to remove duplicate rows, preserving only a single instance of each unique row. We'll use a column named "key" to identify duplicates, ensuring only one entry exists for each distinct "key" value.
Solution:
This process involves two key steps:
The following SQL command accomplishes this:
<code class="language-sql">DELETE FROM dups a USING ( SELECT MIN(ctid) as ctid, key FROM dups GROUP BY key HAVING COUNT(*) > 1 ) b WHERE a.key = b.key AND a.ctid <> b.ctid;</code>
Important Note: This method doesn't offer control over which specific row is retained.
Illustrative Example:
Consider this sample table:
<code class="language-sql">CREATE TABLE people ( name varchar(50) NOT NULL, surname varchar(50) NOT NULL, age integer NOT NULL ); INSERT INTO people (name, surname, age) VALUES ('A.', 'Tom', 30), ('A.', 'Tom', 10), ('B.', 'Tom', 20), ('B', 'Chris', 20);</code>
To find the first occurrence of duplicates:
<code class="language-sql">SELECT MIN(ctid) as ctid, name, surname FROM people GROUP BY (name, surname) HAVING COUNT(*) > 1;</code>
Output:
ctid | name | surname |
---|---|---|
(0,1) | A. | Tom |
Deleting the non-first duplicates:
<code class="language-sql">DELETE FROM people a USING ( SELECT MIN(ctid) as ctid, name, surname FROM people GROUP BY (name, surname) HAVING COUNT(*) > 1 ) b WHERE a.name = b.name AND a.surname = b.surname AND a.ctid <> b.ctid;</code>
The remaining rows:
<code class="language-sql">SELECT * FROM people;</code>
Output:
name | surname | age |
---|---|---|
A. | Tom | 30 |
B. | Tom | 20 |
B | Chris | 20 |
The above is the detailed content of How to Efficiently Delete Duplicate Rows from a Table in SQL?. For more information, please follow other related articles on the PHP Chinese website!