Home Web Front-end HTML Tutorial robot.txt_html/css_WEB-ITnose

robot.txt_html/css_WEB-ITnose

Jun 24, 2016 am 11:53 AM

In China, website managers do not seem to pay much attention to robots.txt, but some functions cannot be achieved without it, so today Shijiazhuang SEO would like to briefly talk about robots through this article. txt writing. ? part, or specify that the search engine only includes the specified content.

When a search robot (some called a search spider) visits a site,

Basic introduction to robots.txt

robots.txt is a plain text file. In this file, website administrators can declare the parts of the website that they do not want to be accessed by robots, or specify that search engines only include specified content.

When a search robot (some called a search spider) visits a site, it will first check whether robots.txt exists in the root directory of the site. If it exists, the search robot will The scope of access is determined based on the content of the file; if the file does not exist, the search robot crawls along the link.

In addition, robots.txt must be placed in the root directory of a site, and the file name must be all lowercase.

Robots.txt writing syntax

First, let’s take a look at a robots.txt example: http://www.shijiazhuangseo.com. cn/robots.txt

Visit the above specific address, we can see the specific content of robots.txt as follows:

# Robots.txt file from http://www.shijiazhuangseo.com.cn

# All robots will spider the domain

User-agent: *

Disallow :

The above text means that all search robots are allowed to access all files under the www.shijiazhuangseo.com..cn site.

Specific syntax analysis: The text after # is explanatory information; User-agent: is followed by the name of the search robot. If it is followed by *, it generally refers to all search robots; Disallow: The following are the file directories that are not allowed to be accessed.

Below, I will list some specific uses of robots.txt:

Allow all robots to access

User-agent: *

Disallow:

Or you can create an empty file "/robots.txt" file

Disable all search engines from accessing any part of the site

User-agent: *

Disallow: /

Disable all search engines from accessing several parts of the website (01, 02, 03 directories in the example below)

User-agent: *

Disallow: / 01/

Disallow: /02/

Disallow: /03/

Disallow access to a search engine (BadBot in the example below)

User-agent: BadBot

Disallow: /

Only allow access from a certain search engine (in the example below Crawler)

User-agent: Crawler

Disallow:

User-agent: *

Disallow: /

In addition, I think it is necessary to expand the explanation and give some introduction to robots meta:

The Robots META tag is mainly It is for each specific page. Like other META tags (such as the language used, page description, keywords, etc.), the Robots META tag is also placed in the

of the page, specifically used to tell the search engine ROBOTS how to crawl the page. content.

How to write the Robots META tag:

There is no case distinction in the Robots META tag. name="Robots" means all search engines. You can write name="BaiduSpider" for a specific search engine. There are four command options in the content part: index, noindex, follow, nofollow. The commands are separated by ",".

INDEX instruction tells the search robot to crawl the page;

FOLLOW instruction means the search robot can continue crawling along the links on the page Go down;

The default values ​​of Robots Meta tags are INDEX and FOLLOW, except for inktomi, for which the default values ​​are INDEX, NOFOLLOW.

In this way, there are four combinations:

<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">

<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">

<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">

<META NAME=" ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

Where

<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW"> It can be written as <META NAME="ROBOTS" CONTENT="ALL">; ROBOTS" CONTENT="NONE">

At present, it seems that the vast majority of search engine robots comply with the rules of robots.txt, but for the Robots META tag, it is not currently supported. There are many, but they are gradually increasing. For example, the famous search engine GOOGLE fully supports it, and GOOGLE has also added a command "archive" that can limit whether GOOGLE retains web page snapshots. For example:

<META NAME="googlebot" CONTENT="index,follow,noarchive">

means crawling the pages in this site And crawl along the links in the page, but do not keep a web snapshot of the page on GOOLGE.

The above is Shijiazhuang SEO’s syntax for writing robots.txt

First, let’s look at an example of robots.txt: http://www.shijiazhuangseo.com.cn /robots.txt

Visit the above specific address, we can see the specific content of robots.txt as follows:

# Robots.txt file from http://www.shijiazhuangseo.com.cn# All robots will spider the domain

User-agent: *

Disallow:

The above text means that all search robots are allowed to access all files under the www.shijiazhuangseo.com.cn site.

Specific syntax analysis: The text after # is explanatory information; User-agent: is followed by the name of the search robot. If it is followed by *, it generally refers to all search robots; Disallow: The following are the file directories that are not allowed to be accessed.

Below, I will list some specific uses of robots.txt:

Allow all robots to access

User-agent: *

Disallow:

Or you can create an empty file "/robots.txt" file

Disable all search engines from accessing any part of the site

User-agent: *

Disallow: /

Disable all search engines from accessing several parts of the website (01, 02, 03 directories in the example below)

User-agent: *

Disallow: / 01/

Disallow: /02/

Disallow: /03/

Disallow access to a search engine (BadBot in the example below)

User-agent: BadBot

Disallow: /

Only allow access from a certain search engine (Crawler in the example below)

User-agent: Crawler

Disallow:

User-agent: *

Disallow: /

Also, I think It is necessary to expand the explanation and give some introduction to robots meta:

The Robots META tag is mainly for specific pages. Like other META tags (such as the language used, page description, keywords, etc.), the Robots META tag is also placed in the

of the page, specifically used to tell the search engine ROBOTS how to crawl the page. content.

How to write the Robots META tag:

There is no case distinction in the Robots META tag. name="Robots" means all search engines , which can be written as name="BaiduSpider" for a specific search engine. There are four command options in the content part: index, noindex, follow, nofollow. The commands are separated by ",".

INDEX instruction tells the search robot to crawl the page;

FOLLOW instruction means the search robot can continue crawling along the links on the page Go down;

The default values ​​of Robots Meta tags are INDEX and FOLLOW, except for inktomi, for which the default values ​​are INDEX, NOFOLLOW.

In this way, there are four combinations:

<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">

<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">

<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">

<META NAME=" ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

Where

<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW"> It can be written as <META NAME="ROBOTS" CONTENT="ALL">; ROBOTS" CONTENT="NONE">

At present, it seems that the vast majority of search engine robots comply with the rules of robots.txt, but for the Robots META tag, it is not currently supported. There are many, but they are gradually increasing. For example, the famous search engine GOOGLE fully supports it, and GOOGLE has also added a command "archive" that can limit whether GOOGLE retains web page snapshots. For example:

<META NAME="googlebot" CONTENT="index,follow,noarchive">

means crawling the pages in this site And crawl along the links in the page, but do not keep a web snapshot of the page on GOOLGE.

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What is the purpose of the <progress> element? What is the purpose of the <progress> element? Mar 21, 2025 pm 12:34 PM

The article discusses the HTML &lt;progress&gt; element, its purpose, styling, and differences from the &lt;meter&gt; element. The main focus is on using &lt;progress&gt; for task completion and &lt;meter&gt; for stati

What is the purpose of the <datalist> element? What is the purpose of the <datalist> element? Mar 21, 2025 pm 12:33 PM

The article discusses the HTML &lt;datalist&gt; element, which enhances forms by providing autocomplete suggestions, improving user experience and reducing errors.Character count: 159

What is the purpose of the <meter> element? What is the purpose of the <meter> element? Mar 21, 2025 pm 12:35 PM

The article discusses the HTML &lt;meter&gt; element, used for displaying scalar or fractional values within a range, and its common applications in web development. It differentiates &lt;meter&gt; from &lt;progress&gt; and ex

Is HTML easy to learn for beginners? Is HTML easy to learn for beginners? Apr 07, 2025 am 12:11 AM

HTML is suitable for beginners because it is simple and easy to learn and can quickly see results. 1) The learning curve of HTML is smooth and easy to get started. 2) Just master the basic tags to start creating web pages. 3) High flexibility and can be used in combination with CSS and JavaScript. 4) Rich learning resources and modern tools support the learning process.

What is the viewport meta tag? Why is it important for responsive design? What is the viewport meta tag? Why is it important for responsive design? Mar 20, 2025 pm 05:56 PM

The article discusses the viewport meta tag, essential for responsive web design on mobile devices. It explains how proper use ensures optimal content scaling and user interaction, while misuse can lead to design and accessibility issues.

What is the purpose of the <iframe> tag? What are the security considerations when using it? What is the purpose of the <iframe> tag? What are the security considerations when using it? Mar 20, 2025 pm 06:05 PM

The article discusses the &lt;iframe&gt; tag's purpose in embedding external content into webpages, its common uses, security risks, and alternatives like object tags and APIs.

The Roles of HTML, CSS, and JavaScript: Core Responsibilities The Roles of HTML, CSS, and JavaScript: Core Responsibilities Apr 08, 2025 pm 07:05 PM

HTML defines the web structure, CSS is responsible for style and layout, and JavaScript gives dynamic interaction. The three perform their duties in web development and jointly build a colorful website.

What is an example of a starting tag in HTML? What is an example of a starting tag in HTML? Apr 06, 2025 am 12:04 AM

AnexampleofastartingtaginHTMLis,whichbeginsaparagraph.StartingtagsareessentialinHTMLastheyinitiateelements,definetheirtypes,andarecrucialforstructuringwebpagesandconstructingtheDOM.

See all articles