What is the reason for the blockage? You can analyze it in the following ways:
1. Packet capture analysis to see if it is due to the network; 2. What framework did you use to write the crawler? Is it urllib2 or scrapy framework? Check the logs. 3. Check whether the url pool has been processed and no new target tasks have been added to the crawling queue.
You can use multi-threading, each thread processes one month's data, so that even if there is a problem with the data of any month, the integrity of most of the data can still be ensured, and then the data of the month with the problem can be analyzed in detail.
What is the reason for the blockage? You can analyze it in the following ways:
1. Packet capture analysis to see if it is due to the network;
2. What framework did you use to write the crawler? Is it urllib2 or scrapy framework? Check the logs.
3. Check whether the url pool has been processed and no new target tasks have been added to the crawling queue.
You can use multi-threading, each thread processes one month's data, so that even if there is a problem with the data of any month, the integrity of most of the data can still be ensured, and then the data of the month with the problem can be analyzed in detail.