The example in this article describes the usage of robots.txt file in Empire CMS. Share it with everyone for your reference. The specific analysis is as follows:
Before talking about the use of robots.txt file in Empire CMS, let me explain to you what robots.tx does.
Robots protocol (also known as crawler protocol, crawler rules , robot protocol, etc.) is robots.txt. The website tells the search engine which pages can be crawled and which pages cannot be crawled through the robots protocol. The Robots protocol is a common code of ethics in the international Internet community. Its purpose is to protect website data and sensitive information and ensure that users' personal information and privacy are not infringed. Because it is not a command, search engines need to consciously comply with it. Some viruses such as malware often obtain website background data and personal information by ignoring the robots protocol.
The robots.txt file is a text file that can be created and edited using any common text editor, such as Notepad that comes with the Windows system. robots.txt is a protocol, not a command. robots.txt is the first file that search engines look at when visiting a website. The robots.txt file tells the spider what files on the server can be viewed.
Recommended to study "Empire cms tutorial"
When a search spider visits a site, it will first check whether robots.txt exists in the root directory of the site. If If it exists, search robots will determine the scope of access based on the contents of the file; if the file does not exist, all search spiders will be able to access all pages on the website that are not password protected. Baidu officially recommends that you only need to use the robots.txt file when your website contains content that you do not want to be indexed by search engines. If you want search engines to include all content on your site, do not create a robots.txt file.
If you think of a website as a room in a hotel, robots.txt is the "Do Not Disturb" or "Welcome to Clean" sign hung by the owner at the door of the room. This file tells visiting search engines which rooms can be entered and visited, and which rooms are not open to search engines because they store valuables or may involve the privacy of residents and visitors. But robots.txt is not a command, nor is it a firewall, just like a gatekeeper cannot stop malicious intruders such as thieves.
The default robots.txt of Imperial CMS is:
The code is as follows:
# # robots.txt for EmpireCMS # User-agent: * *允许所有搜索引擎抓取 Disallow: /d/ *禁止所有搜索引擎抓取D目录 Disallow: /e/class/ *禁止所有搜索引擎抓取/e/class/ 目录 Disallow: /e/data/ *禁止所有搜索引擎抓取/e/data/ 目录 Disallow: /e/enews/ *禁止所有搜索引擎抓取/e/enews/ 目录 Disallow: /e/update/ *禁止所有搜索引擎抓取/e/update/ 目录
I hope this article will be helpful to everyone in building Imperial CMS website.
The above is the detailed content of One trick to solve the usage of robots.txt file in Empire CMS. For more information, please follow other related articles on the PHP Chinese website!