I crawled the Chinese Judgment Documents Network, which was fine before. I sent a request and the server returned 200, and then I processed the data in the body
But a week ago, suddenly all requests returned 202, and then the response body was also empty, and no data could be obtained at all. I blocked and waited in the callback function while (response.status == 202) and even slept. If used, the status will not change
what can we do about it?
I used crwalera's IP proxy service. It was also 202 for a while before, but it got better after a day, but this time it has lasted for a week, which is very strange
I think the target website has too much load, so I use an asynchronous method to send data, but how do I receive his data correctly in scrapy?
This situation is usually caused by illegal crawling, and the server has implemented anti-crawling restrictions. If it is captured legally, you can communicate with the content department to see if there is any accidental damage. If it is captured illegally, it is recommended not to do this. In serious cases, there may be a risk of prosecution
If you have been prevented from harvesting, you can try changing your IP address or looking for loopholes to prevent harvesting