Requirements: Remove duplicate data from a user table based on the name/email/card_num field;
Ideas: Use the group by method to query the 'duplicate' data, store the data in a temporary table, and then store the data in the temporary table into the specified table;
Misunderstandings and Solution: The group by method can only obtain some fields (specified fields without duplication), and cannot obtain complete data at one time. However, the id in the group by result set can be obtained through the max function, and then all the data can be queried based on the id set. Record.
SELECT max(id) as id,name,email,card_num FROM users GROUP BY name,email,card_num;
SELECT ID from (SELECT max(id ) as id,name,email,card_num FROM users GROUP BY name,email,card_num) as T;
SELECT * from users where id in (SELECT ID from (SELECT max(id) as id,name,email,card_num FROM users GROUP BY name,email,card_num) as T);
create TEMP TABLE tmp_data as SELECT * from users where id in (SELECT ID from (SELECT max(id) as id,name,email,card_num FROM users GROUP BY name, email,card_num) as T);
insert into users_copy1 select * from tmp_data;
select count(*) from users_copy1;
Test results: 2300 pieces of data are duplicated among 1.4w pieces of data. The actual running result is 0.7s, which basically meets the current needs.
For more MySQL related technical articles, please visit the MySQL Tutorial column to learn!
The above is the detailed content of Database deduplication based on specified fields. For more information, please follow other related articles on the PHP Chinese website!