MySQL导入大量数据（36.6G）去重问题

Question

有个36.6G的csv文件，需要去重并导入到数据库（顺序无所谓，只需要结果是一个无重复的表），如何处理？

PHPz · Answer

If the Foo field cannot be repeated, then just define Unique and it will be automatically removed:

CREATE TABLE xxx (
   ...
   Foo varchar unique not null,
   ...
);

大家讲道理 · Answer

You can import all the database and delete duplicate data through sql operation

伊谢尔伦 · Answer

Create a unique index for possible duplicate fields

When inserting, use insert ignore into ...

怪我咯 · Answer

You can use bash, sort first, and then use awk to check whether adjacent lines are the same. If not, output them to a new file. This is actually not slow, but it may require a lot of space.

A better approach is to let the database handle it by itself when importing, such as defining unique fields as mentioned above.

Php8, I'm coming too

Learn website layout in 30 minutes

Shangguan Oracle Beginner to Proficient Video Tutorial

Your first line of UNI-APP code

Flutter from scratch to app launch

Brother Lian New Linux Video Tutorial

AXURE 9 Video Tutorial (Suitable for Product Manager Interactive Product Design UI)

Zero Basic Proficiency PS Video Tutorial

16 day UI video tutorial to get you started

PS Techniques and Slicing Techniques Video Tutorial

Alibaba Cloud Environment Construction and Project Launch Video Tutorial

Overview of Computer Networks - Basic Knowledge that Programmers Must Master

Essential Tutorial for Programmers - HTTP Protocol Explanation

Websocket Video Tutorial