MySQL导入大量数据(36.6G)去重问题
ringa_lee
ringa_lee 2017-04-17 13:27:41
0
4
504

有个36.6G的csv文件,需要去重并导入到数据库(顺序无所谓,只需要结果是一个无重复的表),如何处理?

ringa_lee
ringa_lee

ringa_lee

reply all(4)
PHPzhong

If the Foo field cannot be repeated, then just define Unique and it will be automatically removed:

CREATE TABLE xxx (
   ...
   Foo varchar unique not null,
   ...
);
大家讲道理

You can import all the database and delete duplicate data through sql operation

伊谢尔伦

Create a unique index for possible duplicate fields

When inserting, use insert ignore into ...

刘奇

You can use bash, sort first, and then use awk to check whether adjacent lines are the same. If not, output them to a new file. This is actually not slow, but it may require a lot of space.

A better approach is to let the database handle it by itself when importing, such as defining unique fields as mentioned above.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!