How to use Python to read tens of millions of data and automatically write it to a MySQL database-Mysql Tutorial-php.cn

How to use Python to read tens of millions of data and automatically write it to a MySQL database

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2023-05-30 11:55:28

forward

1746 people have browsed it

Scenario 1: Data does not need to be written to mysql frequently

Use the import wizard function of the navicat tool. This software can support a variety of file formats, and can automatically create tables based on file fields or insert data into existing tables, which is very fast and convenient.

How to use Python to read tens of millions of data and automatically write it to a MySQL database

Scenario 2: The data is incremental and needs to be automated and frequently written to mysql

test Data: csv format, about 12 million rows

import pandas as pd
data = pd.read_csv(&#39;./tianchi_mobile_recommend_train_user.csv&#39;)
data.shape

Copy after login

Print results:

How to use Python to read tens of millions of data and automatically write it to a MySQL database

##Method 1:python ➕ pymysql library

Install pymysql command:

pip install pymysql

Copy after login

Code implementation:

import pymysql
# 数据库连接信息
conn = pymysql.connect(
       host=&#39;127.0.0.1&#39;,
       user=&#39;root&#39;,
       passwd=&#39;wangyuqing&#39;,
       db=&#39;test01&#39;,
       port = 3306,
       charset="utf8")
# 分块处理
big_size = 100000
# 分块遍历写入到 mysql
with pd.read_csv(&#39;./tianchi_mobile_recommend_train_user.csv&#39;,chunksize=big_size) as reader:
    for df in reader:
        datas = []
        print(&#39;处理：&#39;,len(df))
#         print(df)
        for i ,j in df.iterrows():
            data = (j[&#39;user_id&#39;],j[&#39;item_id&#39;],j[&#39;behavior_type&#39;],
                    j[&#39;item_category&#39;],j[&#39;time&#39;])
            datas.append(data)
        _values = ",".join([&#39;%s&#39;, ] * 5)
        sql = """insert into users(user_id,item_id,behavior_type
        ,item_category,time) values(%s)""" % _values
        cursor = conn.cursor()
        cursor.executemany(sql,datas)
        conn.commit()
 # 关闭服务
conn.close()
cursor.close()
print(&#39;存入成功！&#39;)

Copy after login

How to use Python to read tens of millions of data and automatically write it to a MySQL database

Method 2: pandas ➕ sqlalchemy: pandas needs to introduce sqlalchemy to support sql. With the support of sqlalchemy, it can implement queries, updates and other operations of all common database types.

Code implementation:

from sqlalchemy import create_engine
engine = create_engine(&#39;mysql+pymysql://root:wangyuqing@localhost:3306/test01&#39;)
data = pd.read_csv(&#39;./tianchi_mobile_recommend_train_user.csv&#39;)
data.to_sql(&#39;user02&#39;,engine,chunksize=100000,index=None)
print(&#39;存入成功！&#39;)

Copy after login

Summary

The pymysql method takes 12 minutes and 47 seconds, which is quite long and has a large amount of code, while pandas It only takes five lines of code to achieve this requirement, and it only takes about 4 minutes. Finally, I would like to add that the first method requires creating a table in advance, but the second method does not. Therefore, it is recommended that you use the second method, which is both convenient and efficient. If you still feel it is slow, you can consider adding multi-process and multi-threading.

The three most complete methods of storing data into the MySQL database:

Direct storage, using the import wizard function of navicat
Python pymysql
Pandas sqlalchemy

The above is the detailed content of How to use Python to read tens of millions of data and automatically write it to a MySQL database. For more information, please follow other related articles on the PHP Chinese website!