Home Database Mysql Tutorial 千万级别mysql合并表快速去重简析_MySQL

千万级别mysql合并表快速去重简析_MySQL

Jun 01, 2016 pm 01:42 PM
mysql surface

bitsCN.com 千万级别mysql合并表快速去重简析 mysql合并表去重目标:现有表a和b,把两个表中的数据合并去重到c表中。其中a和b表中数据量大概在2千万左右。基本情况操作系统版本:CentOS release 5.6 64位操作系统内存:8G数据库版本:5.1.56-community 64位数据库初始化参数:默认 数据库表和数据量表a:    mysql> desc a2kw;+-------+-------------+------+-----+---------+-------+| Field | Type        | Null | Key | Default | Extra |+-------+-------------+------+-----+---------+-------+| c1    | varchar(20) | YES  | MUL | NULL    |       || c2    | varchar(30) | YES  |     | NULL    |       || c3    | varchar(12) | YES  |     | NULL    |       || c4    | varchar(20) | YES  |     | NULL    |       |+-------+-------------+------+-----+---------+-------+4 rows in set (0.00 sec)表bmysql> desc b2kw;+-------+-------------+------+-----+---------+-------+| Field | Type        | Null | Key | Default | Extra |+-------+-------------+------+-----+---------+-------+| c1    | varchar(20) | YES  |     | NULL    |       || c2    | varchar(30) | YES  |     | NULL    |       || c3    | varchar(12) | YES  |     | NULL    |       || c4    | varchar(20) | YES  |     | NULL    |       |+-------+-------------+------+-----+---------+-------+4 rows in set (0.00 sec) a和b表的数据概况如下mysql> select * from a2kw limit 10;+-----------+-----------+------+----------+| c1        | c2        | c3   | c4       |+-----------+-----------+------+----------+| 662164461 | 131545534 | TOM0 | 20120520 || 226662142 | 605685564 | TOM0 | 20120516 || 527008225 | 172557633 | TOM0 | 20120514 || 574408183 | 350897450 | TOM0 | 20120510 || 781619324 | 583989494 | TOM0 | 20120510 || 158872754 | 775676430 | TOM0 | 20120512 || 815875622 | 631631832 | TOM0 | 20120514 || 905943640 | 477433083 | TOM0 | 20120514 || 660790641 | 616774715 | TOM0 | 20120512 || 999083595 | 953186525 | TOM0 | 20120513 |+-----------+-----------+------+----------+10 rows in set (0.01 sec) 基本步骤    1、在B表上创建索引mysql> select count(*) from b2kw;+----------+| count(*) |+----------+| 20000002 |+----------+1 row in set (0.00 sec)mysql> create index ind_b2kw_c1 on  b2kw(c1);Query OK, 20000002 rows affected (1 min 2.94 sec)Records: 20000002  Duplicates: 0  Warnings: 0数据量为:20000002 ,时间为:1 min 2.94 sec2、把a、b分别插入中间表temp表中 创建中间表mysql> create table temp  select * from c2kw where 1=2;Query OK, 0 rows affected (0.00 sec)Records: 0  Duplicates: 0  Warnings: 0插入数据mysql> insert into temp  select * from a2kw;Query OK, 20000002 rows affected (13.23 sec)Records: 20000002  Duplicates: 0  Warnings: 0mysql> insert into temp  select * from b2kw;Query OK, 20000002 rows affected (13.27 sec)Records: 20000002  Duplicates: 0  Warnings: 0     mysql> select count(*) from temp;+----------+| count(*) |+----------+| 40000004 |+----------+1 row in set (0.00 sec)数据量为:40000004 ,时间为:26.50 sec3、temp建立联合索引,强制索引去掉重复数据mysql> create index ind_temp_c123 on temp(c1,c2,c3);Query OK, 40000004 rows affected (3 min 43.87 sec)Records: 40000004  Duplicates: 0  Warnings: 0查看执行计划mysql> explain select c1,c2,c3,max(c4) from temp FORCE INDEX
(ind_temp_c123) group by c1,c2,c3 ;+----+-------------+-------+-------+---------------+----------
-----+---------+------+----------+-------+| id | select_type | table | type  | possible_keys | key      
    | key_len | ref  | rows     | Extra |+----+-------------+-------+-------+---------------+-------------
--+---------+------+----------+-------+|  1 | SIMPLE      | temp  | index | NULL          | ind_temp_c123 | 71  
   | NULL | 40000004 |       |+----+-------------+-------+    -------+---------------+--------
-------+---------+------+----------+-------+1 row in set (0.05 sec) mysql> insert into c2kw select c1,c2,c3,max(c4) from temp
FORCE INDEX (ind_temp_c123) group by c1,c2,c3 ;Query OK, 20000004 rows affected (2 min 0.85 sec)Records: 20000004  Duplicates: 0  Warnings: 0实际大约花费实际为:6 min
 4、删除中间表mysql> drop table temp;Query OK, 0 rows affected (0.99 sec)实际大约花费实际为:1 sec
 5、建立c索引mysql> create index ind_c2kw_c1 on c2kw(c1);Query OK, 20000004 rows affected (49.74 sec)Records: 20000004  Duplicates: 0  Warnings: 0mysql> create index ind_c2kw_c2 on c2kw(c2);Query OK, 20000004 rows affected (1 min 47.20 sec)Records: 20000004  Duplicates: 0  Warnings: 0mysql> create index ind_c2kw_c3 on c2kw(c3);Query OK, 20000004 rows affected (2 min 42.02 sec)Records: 20000004  Duplicates: 0  Warnings: 0实际大约花费实际为:5分钟    
 6、清空a、b表mysql> truncate table a2kw;Query OK, 0 rows affected (1.15 sec)mysql> truncate table b2kw;Query OK, 0 rows affected (1.34 sec)实际大约花费实际为:3sec 一共花费的时间大概在15分钟左右   作者 RuleV5 bitsCN.com

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP's big data structure processing skills PHP's big data structure processing skills May 08, 2024 am 10:24 AM

Big data structure processing skills: Chunking: Break down the data set and process it in chunks to reduce memory consumption. Generator: Generate data items one by one without loading the entire data set, suitable for unlimited data sets. Streaming: Read files or query results line by line, suitable for large files or remote data. External storage: For very large data sets, store the data in a database or NoSQL.

How to optimize MySQL query performance in PHP? How to optimize MySQL query performance in PHP? Jun 03, 2024 pm 08:11 PM

MySQL query performance can be optimized by building indexes that reduce lookup time from linear complexity to logarithmic complexity. Use PreparedStatements to prevent SQL injection and improve query performance. Limit query results and reduce the amount of data processed by the server. Optimize join queries, including using appropriate join types, creating indexes, and considering using subqueries. Analyze queries to identify bottlenecks; use caching to reduce database load; optimize PHP code to minimize overhead.

How to use MySQL backup and restore in PHP? How to use MySQL backup and restore in PHP? Jun 03, 2024 pm 12:19 PM

Backing up and restoring a MySQL database in PHP can be achieved by following these steps: Back up the database: Use the mysqldump command to dump the database into a SQL file. Restore database: Use the mysql command to restore the database from SQL files.

How to insert data into a MySQL table using PHP? How to insert data into a MySQL table using PHP? Jun 02, 2024 pm 02:26 PM

How to insert data into MySQL table? Connect to the database: Use mysqli to establish a connection to the database. Prepare the SQL query: Write an INSERT statement to specify the columns and values ​​to be inserted. Execute query: Use the query() method to execute the insertion query. If successful, a confirmation message will be output.

How to fix mysql_native_password not loaded errors on MySQL 8.4 How to fix mysql_native_password not loaded errors on MySQL 8.4 Dec 09, 2024 am 11:42 AM

One of the major changes introduced in MySQL 8.4 (the latest LTS release as of 2024) is that the "MySQL Native Password" plugin is no longer enabled by default. Further, MySQL 9.0 removes this plugin completely. This change affects PHP and other app

How to use MySQL stored procedures in PHP? How to use MySQL stored procedures in PHP? Jun 02, 2024 pm 02:13 PM

To use MySQL stored procedures in PHP: Use PDO or the MySQLi extension to connect to a MySQL database. Prepare the statement to call the stored procedure. Execute the stored procedure. Process the result set (if the stored procedure returns results). Close the database connection.

How to create a MySQL table using PHP? How to create a MySQL table using PHP? Jun 04, 2024 pm 01:57 PM

Creating a MySQL table using PHP requires the following steps: Connect to the database. Create the database if it does not exist. Select a database. Create table. Execute the query. Close the connection.

The difference between oracle database and mysql The difference between oracle database and mysql May 10, 2024 am 01:54 AM

Oracle database and MySQL are both databases based on the relational model, but Oracle is superior in terms of compatibility, scalability, data types and security; while MySQL focuses on speed and flexibility and is more suitable for small to medium-sized data sets. . ① Oracle provides a wide range of data types, ② provides advanced security features, ③ is suitable for enterprise-level applications; ① MySQL supports NoSQL data types, ② has fewer security measures, and ③ is suitable for small to medium-sized applications.

See all articles