Removing Duplicate Rows in SQL Server with T-SQL
This guide demonstrates how to effectively remove duplicate rows from a SQL Server table using T-SQL, focusing on a specific column. Let's use the Employee
table with an EmployeeName
column as an example. The objective is to retain only one instance of each unique employee name.
A robust method leverages window functions. The following query accomplishes this:
<code class="language-sql">DELETE x FROM ( SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY EmployeeName ORDER BY empId) FROM Employee ) x WHERE rn > 1;</code>
This query uses ROW_NUMBER()
to assign a unique rank to each row within each group (partition) of identical EmployeeName
values. The PARTITION BY EmployeeName
clause groups the rows, and ORDER BY empId
determines the ranking order within each group (assuming empId
is a unique identifier). rn
represents the rank.
The WHERE rn > 1
clause targets rows with a rank greater than 1 within each group, effectively deleting all duplicates except the first occurrence (which has a rank of 1).
Before executing the DELETE
statement, it's crucial to preview the rows that will be affected. Replace DELETE
with SELECT
to achieve this:
<code class="language-sql">SELECT * FROM ( SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY EmployeeName ORDER BY empId) FROM Employee ) x WHERE rn > 1;</code>
This allows for verification before permanently removing any data.
The above is the detailed content of How to Efficiently Delete Duplicate Records in SQL Server Using T-SQL?. For more information, please follow other related articles on the PHP Chinese website!