java - for循环效率问题

Question

for循环，大概需要遍历100万的数据，对于每一条数据，需要更新5张表，调用两次API接口，但是在执行过程中，因为数据量太大，所以执行的中间，由于耗时严重，经常会出现链接中断等问题，如果把这种遍历100万数据的...

黄舟 · Answer

Use thread pool or distributed processing. If a certain piece of data has been processed, you can mark it somewhere and write it in real time; if it fails, you can start over from the beginning and filter out the completed data.

You can also set a timeout, then save the failed records separately and use a strategy to try again.

PHPz · Answer

Listening to your tone, if you just want to process these 1 million data in a single time, instead of using this program for a long time, you should simply and rudely open a few more processes, and then pass a few parameters to the process through the command line. Assume that If there are 10 processes, then each process is responsible for 100,000 data. Just distinguish the data that each process is responsible for according to specific rules. For example, ID range

大家讲道理 · Answer

Divide 1 million data into multiple parts, each thread processes one piece of data, and the data must be inserted in batches. If the database has an index, it is best to turn it off first. After all the data is inserted, the index will be unified. If it is Oracle, it can be used SQL Loader or other similar tools insert directly, avoiding the use of Hibernate and the like.

怪我咯 · Answer

Use the thread pool to record the starting point of each thread's data, and then start reading the data. It is recommended to use batch update to update the table. The api interface is called asynchronously, and the table is updated while calling. If you think this procedure is more complicated. It is executed sequentially using the update table of the database called by the api interface.