Home > Database > Mysql Tutorial > body text

SQL 中删除重复记录

WBOY
Release: 2016-06-07 17:48:27
Original
823 people have browsed it

这是一篇在mssql server 2008中的一种删除重复记录的sql语句,有需要的朋友可以参考一下哦。

在Database中可能由于某种原因如用户输入,导入数据失败等 导致了重复记录. 如果你没有用主键,约束,或来其它机制实现数据完整性,那最后总是重复记录在你的中.现在让我们来看在SQL SERVER 2008中如何删除这些记录, 首先,可以模拟造一些简单重复记录:

 代码如下 复制代码

Create Table dbo.Employee

([Id] int Primary KEY ,

[Name] varchar(50),

[Age] int,

[Sex] bit default 1)

 

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(1,'James',25,default)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(2,'James',25,default)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(3,'James',25,default)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(4,'Lisa',24,0)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(5,'Lisa',24,0)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(6,'Lisa',24,0)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(7,'Mirsa',23,0)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(8,'Mirsa',23,0)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(9,'Mirsa',23,0)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(10,'John',26,default)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(11,'Abraham',28,default)

Insert Into Employee ([Id] , [Name] , [Age] , [Sex] ) Values(12,'Lincoln',30,default)

 

Select * From dbo.Employee

首先我们使用最常见的方法:

 代码如下 复制代码

Delete From Employee Where Name in (

Name

From Employee Group By Name Having Count(Name)>1);


接着使用RowNumber():

 代码如下 复制代码

Delete T From(

   Select Row_Number() Over(Partition By [Name] Order By (SELECT 0)) As RowNumber,* From Employee) T

Where T.RowNumber > 1;

With Du as

(

  select ROW_NUMBER() Over(Partition by [Name] Order by (SELECT 0)) as rn

  FROM Employee

)

Delete From Dups

Where rn>1;

WITH Dups As

(

  Select [ID],[Name],[Age],[Sex]

    , ROW_NUMBER() OVER(Partition By [Name] Order By (SELECT 0)) AS rn

    ,RANK() OVER(Partition By [Name] Order By (SELECT 0)) AS rnk

  FROM Employee

)

DELETE FROM Dups

WHERE rnrnk;

下面是这四个T-SQL查询的执行计划:

 

你可以看到没有用CTE的方法开销最大, 主要是在Table Spool, 这里开销了44%, Table Spool 是一个物理运算符。

Table Spool 运算符扫描输入,并将各行的一个副本放入隐藏的假脱机表中,此表存储在 tempdb 数据库中并且仅在查询的生存期内存在。如果重绕该运算符(例如通过 Nested Loops 运算符重绕),但不需要任何重新绑定,则将使用假脱机数据,而不用重新扫描输入。

 

注意上面的方法只是在重复记录比较少的情况下, 如果重复记录多. DELETE将会非常慢, 最好的方法是复制目标数据到另一个新表,删除原来的表,重命名新表为原来的表. 或用临时表, 这样还可以减少数据库事务日志. 看下面的T-SQL:

 代码如下 复制代码

WITH Dups As

(

  Select [ID],[Name],[Age],[Sex]

    , ROW_NUMBER() OVER(Partition By [ID] Order By (SELECT 0)) AS rn

  FROM Employee

)

Select [ID],[Name],[Age],[Sex]

INTO dbo.EmployeeDupsTmp

FROM Dups

WHERE rn=1

 

DROP TABLE dbo.Employee;

 

EXEC sp_rename 'dbo.EmployeeDupsTmp','Employee'

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!