Bulk insert operations can significantly enhance the performance of inserting large datasets into Microsoft SQL Server. This article explores alternative approaches to optimize such insertions, addressing the specific challenges faced by the code provided in the question.
Fast Executemany (Pyodbc 4.0.19): Recent versions of Pyodbc (4.0.19 ) offer the Cursor#fast_executemany feature, designed to expedite the execution of multiple-row inserts. By setting crsr.fast_executemany to True, you can potentially gain a significant performance boost compared to the default executemany method.
<code class="python"># Connect to the database and create a cursor with fast_executemany enabled cnxn = pyodbc.connect(conn_str, autocommit=True) crsr = cnxn.cursor() crsr.fast_executemany = True # Execute the bulk insert operation with parameters sql = "INSERT INTO table_name (column1, column2) VALUES (?, ?)" params = [(data1, data2) for (record_id, data1, data2) in data] crsr.executemany(sql, params)</code>
Iterating Using Pandas DataFrame: Alternatively, you could use Pandas to read your CSV data into a DataFrame and leverage its optimized to_sql() method. This approach streamlines data insertion and supports various optimizations, such as chunking and type conversions.
<code class="python">import pandas as pd # Read CSV data into a DataFrame df = pd.read_csv(csv_file) # Establish a database connection engine = sqlalchemy.create_engine(conn_str) # Insert DataFrame into the database using `to_sql()` df.to_sql('table_name', con=engine, if_exists='append', index=False)</code>
Bulk Copy Interface (BCP): The Bulk Copy Interface (BCP) is a native SQL Server utility that allows for high-speed data transfer between files and database tables. BCP offers several performance advantages over standard SQL INSERT statements.
bcp {table_name} in {csv_file} -S {server} -d {database} -E
The optimal approach for your specific scenario depends on factors such as data size, server configuration, and available resources. Generally, fast_executemany provides a significant performance improvement over iterating via a cursor, while BCP often outperforms both in bulk insert scenarios.
The above is the detailed content of How to Speed Up Bulk Inserts into MS SQL Server Using Pyodbc?. For more information, please follow other related articles on the PHP Chinese website!