In this code, the author aims to optimize the insertion of over 1.3 million rows into an MS SQL Server database. Currently, the process takes around 40 minutes to insert 300,000 rows. Based on the code provided, the following approaches are recommended to improve the insertion speed:
The T-SQL BULK INSERT command is specifically designed for efficient bulk data loading. However, it requires the source file to be located on the same machine as the SQL Server instance or in a network location accessible through SMB/CIFS.
Pyodbc 4.0.19 introduced the fast_executemany feature in its Cursor class. When enabled, this feature optimizes the execution of executemany queries, which involve inserting multiple rows of data.
The following code demonstrates how to use fast_executemany:
<code class="python">import pyodbc import time conn_str = 'connection string' cnxn = pyodbc.connect(conn_str, autocommit=True) crsr = cnxn.cursor() crsr.execute("TRUNCATE TABLE fast_executemany_test") sql = "INSERT INTO fast_executemany_test (txtcol) VALUES (?)" params = [(f'txt{i:06d}',) for i in range(1000)] t0 = time.perf_counter() crsr.executemany(sql, params) print(f'{time.perf_counter() - t0:.1f} seconds') crsr.fast_executemany = True t0 = time.perf_counter() crsr.executemany(sql, params) print(f'{time.perf_counter() - t0:.1f} seconds')</code>
In the code above, enabling fast_executemany significantly reduces the execution time.
Instead of iterating over the rows one by one, consider using a list or NumPy array to store the data and then insert the entire collection in a single executemany call. This approach eliminates the overhead of repeated cursor execution.
By implementing these optimizations, it is possible to drastically enhance the performance of bulk insert operations in MS SQL Server using pyodbc.
The above is the detailed content of How to Optimize Bulk Insert Speed to MS SQL Server Using Pyodbc?. For more information, please follow other related articles on the PHP Chinese website!