MySQL: Remove Duplicates from Massive Database Swiftly
Encounters with vast MySQL databases marred by duplicate entries demand prompt resolution, especially in scenarios where time is of the essence. With concerns over query execution time, one such database, comprising over a million rows, has been riddled with duplicates, potentially occupying up to half of its storage capacity.
The aim is to swiftly cleanse the database, identifying unique combinations of "text1" and "text2," while preserving a single record with a non-null "text3" value for each unique combination. This process prevents data loss and ensures data integrity.
Conventional approaches such as DISTINCT and GROUP BY, though effective on smaller databases, have proven inadequate, with query execution times exceeding the 20-minute threshold. Therefore, the search is on for a more efficient solution.
One such solution, leveraging the combined power of "ON DUPLICATE KEY" and "IFNULL()", offers exceptional performance:
CREATE TABLE tmp LIKE yourtable; ALTER TABLE tmp ADD UNIQUE (text1, text2); INSERT INTO tmp SELECT * FROM yourtable ON DUPLICATE KEY UPDATE text3=IFNULL(text3, VALUES(text3)); RENAME TABLE yourtable TO deleteme, tmp TO yourtable; DROP TABLE deleteme;
This approach excels by evading resource-intensive operations like GROUP BY and DISTINCT. It also bypasses the performance pitfalls associated with file sorting, a major impediment for large temporary tables. While a full scan of the original table remains unavoidable, this optimized solution provides the most efficient approach to duplicate removal for massive databases.
The above is the detailed content of How to Quickly Remove Duplicates from a Massive MySQL Database?. For more information, please follow other related articles on the PHP Chinese website!