Use thread pool or distributed processing. If a certain piece of data has been processed, you can mark it somewhere and write it in real time; if it fails, you can start over from the beginning and filter out the completed data.
You can also set a timeout, then save the failed records separately and use a strategy to try again.
Listening to your tone, if you just want to process these 1 million data in a single time, instead of using this program for a long time, you should simply and rudely open a few more processes, and then pass a few parameters to the process through the command line. Assume that If there are 10 processes, then each process is responsible for 100,000 data. Just distinguish the data that each process is responsible for according to specific rules. For example, ID range
Divide 1 million data into multiple parts, each thread processes one piece of data, and the data must be inserted in batches. If the database has an index, it is best to turn it off first. After all the data is inserted, the index will be unified. If it is Oracle, it can be used SQL Loader or other similar tools insert directly, avoiding the use of Hibernate and the like.
Use the thread pool to record the starting point of each thread's data, and then start reading the data. It is recommended to use batch update to update the table. The api interface is called asynchronously, and the table is updated while calling. If you think this procedure is more complicated. It is executed sequentially using the update table of the database called by the api interface.
Use thread pool or distributed processing. If a certain piece of data has been processed, you can mark it somewhere and write it in real time; if it fails, you can start over from the beginning and filter out the completed data.
You can also set a timeout, then save the failed records separately and use a strategy to try again.
Listening to your tone, if you just want to process these 1 million data in a single time, instead of using this program for a long time, you should simply and rudely open a few more processes, and then pass a few parameters to the process through the command line. Assume that If there are 10 processes, then each process is responsible for 100,000 data. Just distinguish the data that each process is responsible for according to specific rules. For example, ID range
Divide 1 million data into multiple parts, each thread processes one piece of data, and the data must be inserted in batches. If the database has an index, it is best to turn it off first. After all the data is inserted, the index will be unified. If it is Oracle, it can be used SQL Loader or other similar tools insert directly, avoiding the use of Hibernate and the like.
Use the thread pool to record the starting point of each thread's data, and then start reading the data. It is recommended to use batch update to update the table. The api interface is called asynchronously, and the table is updated while calling. If you think this procedure is more complicated. It is executed sequentially using the update table of the database called by the api interface.