Efficiently remove duplicate rows without unique identifiers
Removing duplicates can be a challenge when a data table lacks unique row identifiers. This article provides an efficient solution for removing duplicate rows while retaining the first occurrence of the row.
Let’s look at a table with duplicate rows:
col1 | col2 | col3 | col4 | col5 | col6 | col7 |
---|---|---|---|---|---|---|
john | 1 | 1 | 1 | 1 | 1 | 1 |
john | 1 | 1 | 1 | 1 | 1 | 1 |
sally | 2 | 2 | 2 | 2 | 2 | 2 |
sally | 2 | 2 | 2 | 2 | 2 | 2 |
The desired result after removing duplicate rows is:
col1 | col2 | col3 | col4 | col5 | col6 | col7 |
---|---|---|---|---|---|---|
john | 1 | 1 | 1 | 1 | 1 | 1 |
sally | 2 | 2 | 2 | 2 | 2 | 2 |
Solution using CTE and ROW_NUMBER
This method utilizes the common table expression (CTE) and the ROW_NUMBER() function. CTE assigns each row a sequence number (RN) based on a specific order, allowing us to identify and eliminate duplicates.
Here is the SQL query with step-by-step instructions:
<code class="language-sql">WITH CTE AS ( SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7], RN = ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col1) -- 为 col1 定义的每个组内分配序列号 FROM dbo.Table1 ) DELETE FROM CTE WHERE RN > 1; -- 删除 RN 大于 1 的行(表示重复项)</code>
Instructions:
Output:
After executing the query, the updated table will contain:
col1 | col2 | col3 | col4 | col5 | col6 | col7 |
---|---|---|---|---|---|---|
john | 1 | 1 | 1 | 1 | 1 | 1 |
sally | 2 | 2 | 2 | 2 | 2 | 2 |
The above is the detailed content of How to Efficiently Remove Duplicate Rows from a Table Without Unique Identifiers?. For more information, please follow other related articles on the PHP Chinese website!