Methods to refuse search engines to include a website: You can add content [<meta name="robots" content="noarchive">] in the head tag to prevent search engines from crawling the website and displaying web page snapshots.
After a new website is built, as long as the content of the website crawled by search engine spiders is not very bad, then the search engine is very likely to include our website. Website, if you don’t want search engines to include your website for some reason, what should you do? The following article will explain it to you.
Method 1: Set robots.txt Method
You can use to set robots.txt to block search engine spiders, so what is robots.txt?
Search engines use spider programs to automatically access web pages on the Internet and obtain web page information. When a spider visits a website, it will first check whether there is a plain text file called robots.txt under the root domain of the website. This file is used to specify the crawling scope of the spider on your website. You can create a robots.txt in your website and declare in the file the parts of the website that you do not want search engines to include or specify that search engines only include specific parts.
Please note that you only need to use a robots.txt file if your website contains content that you do not want to be indexed by search engines. If you want search engines to include all content on your site, do not create a robots.txt file.
How to use robots.txt to block search engine spiders?
Search engines comply with the robots.txt protocol by default. Create a robots.txt text file and place it in the root directory of the website. Edit the code as follows:
User-agent: * Disallow: /
Through the above code, you can tell Search engines should not crawl or include this site, and be careful when using the code above: this will prohibit all search engines from accessing any part of the site.
If you only want to prohibit Baidu from including the entire site, you can edit the following code:
User-agent: Baiduspider Disallow: /
If you only want to prohibit Google from including the entire site, you can edit the following code:
User-agent: Googlebot Disallow: /
Method 2: Set the web page code method
Add the following code between the code
and on the home page of the website to prevent search engines from crawling Take the website and display a snapshot of the web page.<meta name="robots" content="noarchive">
Add the following code between the codes
and on the home page of the website to prevent Baidu search engine from crawling the website and displaying web page snapshots.<meta name="Baiduspider" content="noarchive">
Add the following code between the codes
and on the home page of the website to prevent Google search engines from crawling the website and displaying web page snapshots.<meta name="googlebot" content="noarchive">
The above is the detailed content of How to refuse search engines to include a website. For more information, please follow other related articles on the PHP Chinese website!