I am writing a project related to massive data acquisition. Currently I'm using .NET Framework 4.8 and the Mysql package to initiate connections and insert data into the database server.
I will be inserting approximately 400,000 rows/second. I'm worried that the SQL connection might become a bottleneck for my program. I want to know if I use sql to create multi-threaded connection and use consumer queue to insert data, will it be faster and worth it (pros and cons)?
In my gut it would be faster, but I'm not sure how much performance it would provide in terms of thread overhead. I'm no SQL expert, so it would be great if someone could explain the pros and cons of opening multiple connections to SQL on multiple threads.
Rumors, opinions, hearsay, facts, version-related benchmarks, some personal experience, etc...
Multiple threads can improve throughput, but there are limitations:
Batch processing:
LOAD DATA
is the fastest way toINSERT
a large number of rows at once from a single thread. However, if you include the cost of writing the file toLOAD
, it may make it slower than a bulk insert.INSERT
follows. But it's capped at "hundreds" of rows when a certain limit or "diminishing returns" is reached.INSERT
query. Therefore, it (orLOAD DATA
) is worth using for high-speed ingestion. (Source: Many different timed tests.)Data Sources:
What happens after loading the data? Of course, this is not a write-only table.
PARTITIONing
Rarely useful, except to eventually clear out old data. SeePartition