robot.txt_html/css_WEB-ITnose
In China, website managers do not seem to pay much attention to robots.txt, but some functions cannot be achieved without it, so today Shijiazhuang SEO would like to briefly talk about robots through this article. txt writing. ? part, or specify that the search engine only includes the specified content.
When a search robot (some called a search spider) visits a site,
Basic introduction to robots.txt
robots.txt is a plain text file. In this file, website administrators can declare the parts of the website that they do not want to be accessed by robots, or specify that search engines only include specified content.
When a search robot (some called a search spider) visits a site, it will first check whether robots.txt exists in the root directory of the site. If it exists, the search robot will The scope of access is determined based on the content of the file; if the file does not exist, the search robot crawls along the link.
In addition, robots.txt must be placed in the root directory of a site, and the file name must be all lowercase.
Robots.txt writing syntax
First, let’s take a look at a robots.txt example: http://www.shijiazhuangseo.com. cn/robots.txt
Visit the above specific address, we can see the specific content of robots.txt as follows:
# Robots.txt file from http://www.shijiazhuangseo.com.cn
# All robots will spider the domain
User-agent: *
Disallow :
The above text means that all search robots are allowed to access all files under the www.shijiazhuangseo.com..cn site.
Specific syntax analysis: The text after # is explanatory information; User-agent: is followed by the name of the search robot. If it is followed by *, it generally refers to all search robots; Disallow: The following are the file directories that are not allowed to be accessed.
Below, I will list some specific uses of robots.txt:
Allow all robots to access
User-agent: *
Disallow:
Or you can create an empty file "/robots.txt" file
Disable all search engines from accessing any part of the site
User-agent: *
Disallow: /
Disable all search engines from accessing several parts of the website (01, 02, 03 directories in the example below)
User-agent: *
Disallow: / 01/
Disallow: /02/
Disallow: /03/
Disallow access to a search engine (BadBot in the example below)
User-agent: BadBot
Disallow: /
Only allow access from a certain search engine (in the example below Crawler)
User-agent: Crawler
Disallow:
User-agent: *
Disallow: /
In addition, I think it is necessary to expand the explanation and give some introduction to robots meta:
The Robots META tag is mainly It is for each specific page. Like other META tags (such as the language used, page description, keywords, etc.), the Robots META tag is also placed in the
of the page, specifically used to tell the search engine ROBOTS how to crawl the page. content.
How to write the Robots META tag:
There is no case distinction in the Robots META tag. name="Robots" means all search engines. You can write name="BaiduSpider" for a specific search engine. There are four command options in the content part: index, noindex, follow, nofollow. The commands are separated by ",".
INDEX instruction tells the search robot to crawl the page;
FOLLOW instruction means the search robot can continue crawling along the links on the page Go down;
The default values of Robots Meta tags are INDEX and FOLLOW, except for inktomi, for which the default values are INDEX, NOFOLLOW.
In this way, there are four combinations:
<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">
<META NAME=" ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
Where
<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW"> It can be written as <META NAME="ROBOTS" CONTENT="ALL">; ROBOTS" CONTENT="NONE">
At present, it seems that the vast majority of search engine robots comply with the rules of robots.txt, but for the Robots META tag, it is not currently supported. There are many, but they are gradually increasing. For example, the famous search engine GOOGLE fully supports it, and GOOGLE has also added a command "archive" that can limit whether GOOGLE retains web page snapshots. For example:
<META NAME="googlebot" CONTENT="index,follow,noarchive">
means crawling the pages in this site And crawl along the links in the page, but do not keep a web snapshot of the page on GOOLGE.
The above is Shijiazhuang SEO’s syntax for writing robots.txt
First, let’s look at an example of robots.txt: http://www.shijiazhuangseo.com.cn /robots.txt
Visit the above specific address, we can see the specific content of robots.txt as follows:
# Robots.txt file from http://www.shijiazhuangseo.com.cn# All robots will spider the domain
User-agent: *
Disallow:
The above text means that all search robots are allowed to access all files under the www.shijiazhuangseo.com.cn site.
Specific syntax analysis: The text after # is explanatory information; User-agent: is followed by the name of the search robot. If it is followed by *, it generally refers to all search robots; Disallow: The following are the file directories that are not allowed to be accessed.
Below, I will list some specific uses of robots.txt:
Allow all robots to access
User-agent: *
Disallow:
Or you can create an empty file "/robots.txt" file
Disable all search engines from accessing any part of the site
User-agent: *
Disallow: /
Disable all search engines from accessing several parts of the website (01, 02, 03 directories in the example below)
User-agent: *
Disallow: / 01/
Disallow: /02/
Disallow: /03/
Disallow access to a search engine (BadBot in the example below)
User-agent: BadBot
Disallow: /
Only allow access from a certain search engine (Crawler in the example below)
User-agent: Crawler
Disallow:
User-agent: *
Disallow: /
Also, I think It is necessary to expand the explanation and give some introduction to robots meta:
The Robots META tag is mainly for specific pages. Like other META tags (such as the language used, page description, keywords, etc.), the Robots META tag is also placed in the
of the page, specifically used to tell the search engine ROBOTS how to crawl the page. content.
How to write the Robots META tag:
There is no case distinction in the Robots META tag. name="Robots" means all search engines , which can be written as name="BaiduSpider" for a specific search engine. There are four command options in the content part: index, noindex, follow, nofollow. The commands are separated by ",".
INDEX instruction tells the search robot to crawl the page;
FOLLOW instruction means the search robot can continue crawling along the links on the page Go down;
The default values of Robots Meta tags are INDEX and FOLLOW, except for inktomi, for which the default values are INDEX, NOFOLLOW.
In this way, there are four combinations:
<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">
<META NAME=" ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
Where
<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW"> It can be written as <META NAME="ROBOTS" CONTENT="ALL">; ROBOTS" CONTENT="NONE">
At present, it seems that the vast majority of search engine robots comply with the rules of robots.txt, but for the Robots META tag, it is not currently supported. There are many, but they are gradually increasing. For example, the famous search engine GOOGLE fully supports it, and GOOGLE has also added a command "archive" that can limit whether GOOGLE retains web page snapshots. For example:
<META NAME="googlebot" CONTENT="index,follow,noarchive">
means crawling the pages in this site And crawl along the links in the page, but do not keep a web snapshot of the page on GOOLGE.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



The article discusses the HTML <progress> element, its purpose, styling, and differences from the <meter> element. The main focus is on using <progress> for task completion and <meter> for stati

The article discusses the HTML <datalist> element, which enhances forms by providing autocomplete suggestions, improving user experience and reducing errors.Character count: 159

The article discusses the HTML <meter> element, used for displaying scalar or fractional values within a range, and its common applications in web development. It differentiates <meter> from <progress> and ex

HTML is suitable for beginners because it is simple and easy to learn and can quickly see results. 1) The learning curve of HTML is smooth and easy to get started. 2) Just master the basic tags to start creating web pages. 3) High flexibility and can be used in combination with CSS and JavaScript. 4) Rich learning resources and modern tools support the learning process.

The article discusses the viewport meta tag, essential for responsive web design on mobile devices. It explains how proper use ensures optimal content scaling and user interaction, while misuse can lead to design and accessibility issues.

The article discusses the <iframe> tag's purpose in embedding external content into webpages, its common uses, security risks, and alternatives like object tags and APIs.

HTML defines the web structure, CSS is responsible for style and layout, and JavaScript gives dynamic interaction. The three perform their duties in web development and jointly build a colorful website.

AnexampleofastartingtaginHTMLis,whichbeginsaparagraph.StartingtagsareessentialinHTMLastheyinitiateelements,definetheirtypes,andarecrucialforstructuringwebpagesandconstructingtheDOM.
