Mysql批量插入数据之前如何判断重复？

Question

使用MySQL做统计，需要查询大量数据计算后重新组装各种数据入库，考虑到大数据量和性能问题，要批量插入数据库，而且可能会出现重复的情况，如何判断重复呢？ 如果在入库前判断重复，需要每条数据都select一下判...

怪我咯 · Answer

You can try replace into or Insert into ..... on duplicate key update

Reference:
http://blog.csdn.net/mchdba/article/details/8647560
http://dev.mysql.com/doc/refman/5.7/en/insert-on- duplicate.html

PHP中文网 · Answer

When importing batches into the database, it is recommended to use the mysql import tool - mysqlimport, which can be set to ignore duplicate data.
http://www.runoob.com/mysql/mysql-database-import.html

高洛峰 · Answer

I think your method of inserting first and then deleting is good.
The duplication you are talking about is "primary key duplication" data, right? Then what you want to insert must be the latest data. I will delete the old data first, assuming the primary key is 'uid', start a transaction first, then 'delete ... where uid in (...)', then insert new data and submit the transaction.
If you still want to optimize, use 'select ...where uid in ()' to find out the existing data at once, and then do not insert the duplicate data.

迷茫 · Answer

Create a temporary table and insert it all, then insert

迷茫 · Answer

@好雨云 What he said about replace into or Insert into ..... on duplicate key update is a solution.

However, it is recommended to use Insert into ..... on duplicate key update

When you have a large amount of data, the efficiency is higher than replace. The reason is that replace requires additional maintenance of the primary key index when inserting data.