python - 禁止自己的网站被爬虫爬去？

Question

禁止自己的网站被爬虫爬去？有什么方法啊

黄舟 · Answer

If you are defending against targeted crawlers, you can set some access restrictions, such as access frequency, add verification codes, etc.

阿神 · Answer

Important content is dynamically added using js
limit http_referer
Different interfaces can consider different templates, the kind that a set of regular expressions cannot perfectly match
Add some random copyright information to content that may be crawled
. You can only visit after logging in
Record access log

That’s all I can think of, but if you really want to catch it, these will just make it a little more difficult to catch it

高洛峰 · Answer

You can modify robots.txt to disable search engine crawling
It is a bit difficult to disable personal crawling. You can only increase the difficulty, such as adding more complex verification codes, access frequency, regular style/data format changes, etc.