First of all, I have no objection to others crawling the content of my website. I don’t necessarily strictly limit other people’s crawling, but some people’s crawling has no bottom line at all. They use one script or even multiple scripts to crawl concurrently. Fetching the content of a server is no different from ddos.
My server is currently experiencing such a situation. Malicious crawling without pause has seriously affected our log analysis and also increased the load on the server.
How to prevent this behavior? I am using nginx server. As far as I know, it can only deny
a certain IP, but deny will still appear in the log in the future, but it will be 403. Moreover, manual deny is too passive. Can you intelligently determine that the number of visits to a certain IP has increased sharply, and then ban it?
1. ngx_http_limit_conn_module can be used to limit the number of connections for a single IP
http://nginx.org/en/docs/http/ngx_htt...
2. ngx_http_limit_req_module can be used to limit the number of requests per second for a single IP
http://nginx.org/en/docs/http/ngx_htt...
3. nginx_limit_speed_module can be used to limit IP speed
https://github.com/yaoweibin/nginx_li...
I will also provide a solution, mainly using fail2ban (http://www.fail2ban.org/). fail2ban asynchronously determines whether to ban using iptable by scanning the log, so it has a relatively small impact on the original system and does not require reconfiguration of nginx. But I don’t know if the number of visits will be too large.
First add
/etc/fail2ban/jail.conf
inThen find
/etc/fail2ban/filter.d/nginx-bansniffer.conf
and change the judgment for 404 toFinally restart the
fail2ban
service. In the above configuration, we ban IP addresses with more than 120 visits every 120 seconds for 1 hour.1. Prevent spider crawling based on User-Agent
2. Create rules in the operating system Firewall to limit the number of simultaneous connections to the same IP
Taking iptables under Linux as an example, the following configuration will limit the establishment of a maximum of 15 connections for the same IP in one minute. Exceeding connections will be discarded by iptables and will not reach nginx
3. Write a bash script to count the access frequency of each IP, and automatically throw the IPs whose frequency exceeds the upper limit you set into the blacklist
For the IP in the blacklist, use a script to automatically write it into iptables or nginx.conf, block it for a few minutes, or reduce its permitted access frequency
I used to use an apache module called YDoD (Yahoo! Department of Defense) when I was at Yahoo. I could customize rules to prevent external abuse of our WEB services. After I came to Taobao, I changed my name to tdod. After searching around, I couldn’t find it. Find open source. But the principle is similar to what I said above.
Try ngx_lua_waf
https://github.com/loveshell/ngx_lua_waf
Function: