Optimizing Duplicate Row Deletion in SQLite Database
Question:
Encountering a massive table with over 36 million rows in SQLite3, you face the challenge of removing duplicate records based on two columns (hash and d). However, there's a catch: no primary key column exists in the table. Seeking an efficient solution, you inquire about the most effective approach to tackle this task.
Answer:
To effectively eliminate duplicate rows in the absence of a primary key, a unique identifier is needed for each record.
Solution:
Leverage SQLite3's special rowid column as our unique identifier. The rowid column is an integer that automatically increments for each new row added to the table. Using rowid, you can retain the record with the lowest rowid value for each (hash, d) combination.
SQL Query:
Implement the following SQL query to achieve your goal:
DELETE FROM YourTable WHERE rowid NOT IN ( SELECT MIN(rowid) FROM YourTable GROUP BY hash, d )
This query identifies and eliminates all rows with duplicate (hash, d) combinations, ensuring only one instance of each combination remains in the table. The MIN(rowid) subquery ensures that for duplicate records, the record with the lowest rowid value is retained.
The above is the detailed content of How to Efficiently Delete Duplicate Rows in a Large SQLite Table Without a Primary Key?. For more information, please follow other related articles on the PHP Chinese website!