How to optimize and migrate 20 million data in MySQL-Mysql Tutorial-php.cn

How to optimize and migrate 20 million data in MySQL

怪我咯

Release： 2017-04-05 13:20:32

Original

1829 people have browsed it

Recently there is a data table with 20 million records that needs to be optimized and migrated. 2000W data is embarrassing for MySQL, because index## is created reasonably #The speed is still quite fast. No matter how much optimization is done, the speed will not be improved much. However, these data have a lot of redundant fields and error messages, which is very inconvenient for statistics and analysis, so I need to create one. Create a new table, take out the data from the old table one by one, optimize it, and put it back into the new table;

1. Clear redundant data and optimize the field structure In

2000W data, we already know the fields that can be used as

query conditions, so we create new fields separately for this part of the data, and reasonably change the field structure for regular data, such as ID cards, which are varchar. (18). After merging unimportant data, we will have a field with a text structure.

We need to calculate some related data, such as ID card type to obtain accurate gender and birth. Place, birthday, age.

2. Data migration

We take out an old data from the database, and then get the new data we want through calculation and processing. Insert new data into the new table. However, the following problems are encountered when obtaining new data.

We can obtain it in batches through MySQL's limit syntax. For example, to obtain 50,000 each time, the SQL statement is as follows:
```
select * from table_name limit 15000000,50000;
```
Copy after login
This method can solve the problem of too large amount of data, but with the limit The first parameter is getting larger and larger, and the query speed will be horribly slow (the execution of the above SQL will take 35 seconds), so we started to optimize the SQL statement, and after optimization it became as follows:
```
select * from table_name order by id desc limit 5000000,50000;
```
Copy after login
.
2000W data can be split by dichotomy. When 1000W data is executed, the data is reversed. After optimization, the SQL execution efficiency is significantly improved, from 35 seconds to 9 seconds;

But it is still very slow. Time is life... Fortunately, we have an auto-increment ID (the first law of creating a data table, there must be an auto-increment field). The optimized SQl is as follows:
```
1. select * from table_name where id>15000000 and id<15050000； 2. select * from table_name where id>15000000 limit 50000;
```
Copy after login
For visual demonstration, I wrote two SQL with the same function. Compared with the first one, the limit of the second SQL will cause the index hit to be worse, and the efficiency will also be reduced. The execution time of the first SQL is 2 milliseconds, and the execution time of the second SQL is 5 milliseconds. (I took the average value). The query speed of each data dropped directly from 35 seconds to 2 milliseconds...
We have three options to store new data into the new table, as follows:

；At first I would definitely think that this solution would not work, because every insertion will have a database IO operation. However, one advantage of this solution is that it can detect problematic data in time and continue execution after modification; using "binding
variable
" in Oracle can improve performance, and MySQL also provides "binding" variable' function. So without changing the logic, try to optimize the data storage speed. The code is as follows:
```
public function actionTest(array $data)
{
    $mysqli = new mysqli("192.168.1.106", "username", "password", "test");
    $sql = "insert into table_name(name,identity) values (?,?)";

    $stmt = $connection->prepare($sql);
    $name = "";
    $identity = "";
    //使用绑定变量
    $stmt->bind_param("si", $name, $identity);
    foreach($data as $val)
    {
        $name = $val[name];
        $identity = $val[card_id];
        //执行
        $stmt->execute();
    }
    $stmt->close();
}
```
Copy after login
The final effect is not very good. MySQL's "bind variable" does not bring obvious speed improvement, but it can effectively prevent SQL injection;
This is the solution I finally selected. First, it can detect problematic data in time, and second, the imported data is very stable. Just like supporting breakpoint resumption, you can see the effect at every step. When executing the script, you can also start writing analysis logic simultaneously;
It is also good to assemble a large SQL file and finally import it through the tool that comes with MySQL. But if there is a problem with one of the SQL statements, you may need to rerun the script. Because it is very painful to modify a symbol in a 9G text file...

3. Summary

Through each This kind of optimization finally reduced the script execution time to less than 20 minutes. After optimization, the data quality has been highly guaranteed. Next time we will try to optimize and migrate 200 million data...

The above is the detailed content of How to optimize and migrate 20 million data in MySQL. For more information, please follow other related articles on the PHP Chinese website!