python - 爬虫在运行一段时间后开始不断获得504的StatusCode，是否是对方站点的反爬虫策略导致？有何回避策略？

Question

最近初学用Python写网页爬虫视图扒取一个站点上的特定数据。 最近碰到的一个现象是，当爬虫运行了一段时间后(根据Fiddler抓包的结果来看，大概是发送了将近3万个http请求后)，爬虫的获取的http响应的StatusCode骤...

PHP中文网 · Answer

The proxy option is checked, which is caused by fiddler. In the past, I often used fiddler to capture packets. After a period of time, I could not access the network. Uncheck the proxy option and the problem was solved

ringa_lee · Answer

You can pay attention to an open source component I wrote, set up a proxy server pool to prevent the blocking of anti-crawler strategies, and automatically adjusted the request frequency, handled abnormal requests, and prioritized agents with fast responses. https://github.com/letcheng/ProxyPool

PHP中文网 · Answer

1.Agent

2. Simulate a complete request

3. Reasonable intervals

4.adsl disconnection and redial

PHPz · Answer

Method:
Change the IP and use a proxy IP. There are many free and paid ones on the Internet
Free IP: http://www.uuip.net/
Paid IP: http://www.daili666.net/

迷茫 · Answer

Try accessing through a proxy

天蓬老师 · Answer

Why is the answer to this question like this? The 50x error lies in the website itself