Comparison of several MySQL methods for inserting or modifying large amounts of data-Mysql Tutorial-php.cn

Comparison of several MySQL methods for inserting or modifying large amounts of data

小云云

Release： 2017-11-27 09:57:05

Original

4957 people have browsed it

Whether it is daily business data processing or database import and export, you may encounter the insertion or modification of a large amount of data that needs to be processed. The method of insertion or modification and the database engine will all affect the insertion speed. This article aims to analyze and compare various methods from a theoretical and practical perspective to facilitate the selection of insertion methods in future applications.

Insertion analysis

The time required to insert a record in MySQL is composed of the following factors, and the numbers represent approximate proportions:

Connection: (3)
Send query To the server: (2)
Analysis query: (2)
Insert record: (1x record size)
Insert index: (1x index)
Close: (1)

If we execute a SQL statement for each insert, then we need to execute all steps except connection and closing N times, which is very time-consuming. There are several ways to optimize:

(1 ) Write multiple rows in each insert statement and insert in batches

(2) Write all query statements into the transaction

(3) Use Load Data to import data

The performance of each method is as follows.

Innodb engine

InnoDB provides MySQL with transaction-safe (ACID compliant) transaction (commit), rollback (rollback) and crash recovery capabilities (crash recovery capabilities) ) type table. InnoDB provides row locking (locking on row level) and foreign key constraints (FOREIGN KEY constraints).

InnoDB is designed to handle large-capacity database systems, and its CPU utilization is unmatched by other disk-based relational database engines. Technically, InnoDB is a complete database system placed on the MySQL background. InnoDB establishes its own dedicated buffer pool in main memory for caching data and indexes.

Test environment

Macbook Air 12mid apache2.2.26 php5.5.10 mysql5.6.16

Total 100W pieces of data

After insertion, the database size is 38.6MB ( No index), 46.8 (with index)

Total time spent on single insert without index: 229s Peak memory: 246KB
Total time spent on single insert with index: 242s Peak memory: 246KB
No index batch Total insertion time: 10s Peak memory: 8643KB
Total batch insertion time with index: 16s Peak memory: 8643KB
Total insertion time without index transaction: 78s Peak memory: 246KB
Total insertion time with index transaction Time consuming: 82s Peak memory: 246KB
Total time consuming for Load Data insertion without index: 12s Peak memory: 246KB
Total time consuming for Indexed Load Data insertion: 11s Peak memory: 246KB

MyIASM engine

MyISAM is the default storage engine of MySQL. The design is simple and supports full-text search.

Test environment

Macbook Air 12mid apache2.2.26 php5.5.10 mysql5.6.16

Total 100W pieces of data

After insertion, the database size is 19.1MB ( No index), 38.6 (with index)

Total time spent on single insertion without index: 82s Peak memory: 246KB
Total time spent on single insertion with index: 86s Peak memory: 246KB
Batch without index Total insertion time: 3s Peak memory: 8643KB
Total time for batch insertion with index: 7s Peak memory: 8643KB
Total time for inserting Load Data without index: 6s Peak memory: 246KB
Load Data with index Total insertion time: 8s Peak memory: 246KB

Summary

The amount of data I tested is not very large, but I can roughly understand the impact of these insertion methods on speed. The fastest one must be It is the Load Data method. This method is relatively troublesome because it involves writing files, but it can take into account both memory and speed.

Test code

<?php
$dsn = &#39;mysql:host=localhost;dbname=test&#39;;
$db = new PDO($dsn,&#39;root&#39;,&#39;&#39;,array(PDO::ATTR_PERSISTENT => true));
//删除上次的插入数据
$db->query(&#39;delete from `test`&#39;);
//开始计时
$start_time = time();
$sum = 1000000;
// 测试选项
$num = 1;
if ($num == 1){
    // 单条插入
    for($i = 0; $i < $sum; $i++){
        $db->query("insert into `test` (`id`,`name`) values ($i,&#39;tsetssdf&#39;)");
    }
} elseif ($num == 2) {
    // 批量插入，为了不超过max_allowed_packet，选择每10万插入一次
    for ($i = 0; $i < $sum; $i++) {
        if ($i == $sum - 1) { //最后一次
            if ($i%100000 == 0){
                $values = "($i, &#39;testtest&#39;)";
                $db->query("insert into `test` (`id`, `name`) values $values");
            } else {
                $values .= ",($i, &#39;testtest&#39;)";
                $db->query("insert into `test` (`id`, `name`) values $values");
            }
            break;
        }
        if ($i%100000 == 0) { //平常只有在这个情况下才插入
            if ($i == 0){
                $values = "($i, &#39;testtest&#39;)";
            } else {
                $db->query("insert into `test` (`id`, `name`) values $values");
                $values = "($i, &#39;testtest&#39;)";
            }
        } else {
            $values .= ",($i, &#39;testtest&#39;)";    
        }
    }
} elseif ($num == 3) {
    // 事务插入
    $db->beginTransaction(); 
    for($i = 0; $i < $sum; $i++){
        $db->query("insert into `test` (`id`,`name`) values ($i,&#39;tsetssdf&#39;)");
    }
    $db->commit();
} elseif ($num == 4) {
    // 文件load data
    $filename = dirname(__FILE__).&#39;/test.sql&#39;;
    $fp = fopen($filename, &#39;w&#39;);
    for($i = 0; $i < $sum; $i++){
        fputs($fp, "$i,&#39;testtest&#39;\r\n");    
    }
    $db->exec("load data infile &#39;$filename&#39; into table test fields terminated by &#39;,&#39;");
}
$end_time = time();
echo "总耗时", ($end_time - $start_time), "秒\n";
echo "峰值内存", round(memory_get_peak_usage()/1000), "KB\n";

Copy after login

Here is a brief description of the splicing form of SQL statement insertion and modification statements:

(1) insert statement splicing:

INSERT INTO user (id, name) VALUES (1, '张三');
INSERT INTO user (id, name) VALUES (2, '李思');
INSERT INTO user (id, name) VALUES ( 3, '王五');

can be spliced into:

INSERT INTO user (id, name) VALUES
(1, '张三'),
(2 , '李思'),
(3, '王五');

(2) update statement splicing:

update user set name='Zhang San' where id= '1';
update user set name='李思' where id='2';
update user set name='王五' where id='3';

can be spliced For:

update user
set name = CASE id
when 1 then '张三'
when 2 then '李思'
when 3 then '王五'
end
where id IN (1,2,3);

The above is a comparison of several MySQL methods for inserting or modifying large amounts of data. I hope it will be helpful to everyone.