Removing Duplicates with Unique Index
In an effort to prevent duplicate data insertion, a normal index was mistakenly created for fields A, B, C, and D, resulting in the presence of duplicate records in a 20 million record table. The question arises: will adding a unique index for these fields remove the duplicates without compromising existing ones?
Correcting the Index and Handling Duplicates
Adding a unique index with the ALTER TABLE statement without the IGNORE modifier will fail since unique records already exist. However, using the IGNORE modifier will remove the duplicates.
Alternative Approach for MySQL Versions 5.7.4 and Above
For MySQL versions 5.7.4 and above, where the IGNORE modifier is not supported, a different approach is recommended:
Syntax for Removing Duplicates with INSERT IGNORE
<code class="sql">CREATE TABLE tmp_data SELECT * FROM mytable; TRUNCATE TABLE mytable; ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D); INSERT IGNORE INTO mytable SELECT * FROM tmp_data; DROP TABLE tmp_data;</code>
Additional Considerations
The documentation does not specify which duplicate row will be retained after using the IGNORE modifier. It is advisable to test this on a smaller dataset before applying the solution to a large table.
The above is the detailed content of Can a Unique Index Remove Duplicates in a Table with Existing Duplicates, and How?. For more information, please follow other related articles on the PHP Chinese website!