如何用SQL语句去掉重复记录
海量数据(百万以上),其中有些全部字段都相同,有些部分字段相同,怎样高效去除重复? 如果要删除手机(mobilePhone),电话(officePhone),邮件(email)同时都相同的数据,以前一直使用这条语句进行去重: delete from 表 where id not in (select max(id) f
海量数据(百万以上),其中有些全部字段都相同,有些部分字段相同,怎样高效去除重复?
如果要删除手机(mobilePhone),电话(officePhone),邮件(email)同时都相同的数据,以前一直使用这条语句进行去重:
delete from 表 where id not in (select max(id) from 表 group by mobilePhone,officePhone,email ) or delete from 表 where id not in (select min(id) from 表 group by mobilePhone,officePhone,email )
其中下面这条会稍快些。上面这条数据对于100万以内的数据效率还可以,重复数1/5的情况下几分钟到几十分钟不等,但是如果数据量达到300万以上,效率骤降,如果重复数据再多点的话,常常会几十小时跑不完,有时候会锁表跑一夜都跑不完。无奈只得重新寻找新的可行方法,今天终于有所收获:
//查询出唯一数据的ID,并把他们导入临时表tmp中 select min(id) as mid into tmp from 表 group by mobilePhone,officePhone,email //查询出去重后的数据并插入finally表中 insert into finally select (除ID以外的字段) from customers_1 where id in (select mid from tmp)
效率对比:用delete方法对500万数据去重(1/2重复)约4小时。4小时,很长的时间。
用临时表插入对500万数据去重(1/2重复)不到10分钟。
SQL语句去掉重复记录,获取重复记录
按照某几个字段名称查找表中存在这几个字段的重复数据并按照插入的时间先后进行删除,条件取决于order by 和row_num。
方法一按照多条件重复处理:
delete tmp from( select row_num = row_number() over(partition by 字段,字段 order by 时间 desc) from 表 where 时间> getdate()-1 ) tmp where row_num > 1
方法二按照单一条件进行去重:
delete from 表 where 主键ID not in( select max(主键ID) from 表 group by 需要去重的字段 having count(需要去重的字段)>=1 )
注意:为提高效率如上两个方法都可以使用临时表, not in 中的表可以先提取临时表#tmp,
然后采用not exists来执行,为避免数量过大,可批量用Top控制删除量
delete top(2) from 表 where not exists (select 主键ID from #tmp where #tmp.主键ID=表.主键ID)

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

HQL and SQL are compared in the Hibernate framework: HQL (1. Object-oriented syntax, 2. Database-independent queries, 3. Type safety), while SQL directly operates the database (1. Database-independent standards, 2. Complex executable queries and data manipulation).

Pinduoduo software provides a lot of good products, you can buy them anytime and anywhere, and the quality of each product is strictly controlled, every product is genuine, and there are many preferential shopping discounts, allowing everyone to shop online Simply can not stop. Enter your mobile phone number to log in online, add multiple delivery addresses and contact information online, and check the latest logistics trends at any time. Product sections of different categories are open, search and swipe up and down to purchase and place orders, and experience convenience without leaving home. With the online shopping service, you can also view all purchase records, including the goods you have purchased, and receive dozens of shopping red envelopes and coupons for free. Now the editor has provided Pinduoduo users with a detailed online way to view purchased product records. method. 1. Open your phone and click on the Pinduoduo icon.

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

DDREASE is a tool for recovering data from file or block devices such as hard drives, SSDs, RAM disks, CDs, DVDs and USB storage devices. It copies data from one block device to another, leaving corrupted data blocks behind and moving only good data blocks. ddreasue is a powerful recovery tool that is fully automated as it does not require any interference during recovery operations. Additionally, thanks to the ddasue map file, it can be stopped and resumed at any time. Other key features of DDREASE are as follows: It does not overwrite recovered data but fills the gaps in case of iterative recovery. However, it can be truncated if the tool is instructed to do so explicitly. Recover data from multiple files or blocks to a single

"Usage of Division Operation in OracleSQL" In OracleSQL, division operation is one of the common mathematical operations. During data query and processing, division operations can help us calculate the ratio between fields or derive the logical relationship between specific values. This article will introduce the usage of division operation in OracleSQL and provide specific code examples. 1. Two ways of division operations in OracleSQL In OracleSQL, division operations can be performed in two different ways.

Oracle and DB2 are two commonly used relational database management systems, each of which has its own unique SQL syntax and characteristics. This article will compare and differ between the SQL syntax of Oracle and DB2, and provide specific code examples. Database connection In Oracle, use the following statement to connect to the database: CONNECTusername/password@database. In DB2, the statement to connect to the database is as follows: CONNECTTOdataba

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream
