Table of Contents
如何让搜索引擎抓取AJAX内容解决方案,抓取ajax
怎让百度搜索引擎抓取我的网站内容?
怎不让搜索引擎抓取你的网站信息(摘抄)
Home php教程 php手册 如何让搜索引擎抓取AJAX内容解决方案,抓取ajax

如何让搜索引擎抓取AJAX内容解决方案,抓取ajax

Jun 13, 2016 am 09:26 AM
crawl search engine

如何让搜索引擎抓取AJAX内容解决方案,抓取ajax

越来越多的网站,开始采用"单页面结构"(Single-page application)。

整个网站只有一张网页,采用Ajax技术,根据用户的输入,加载不同的内容。

这种做法的好处是用户体验好、节省流量,缺点是AJAX内容无法被搜索引擎抓取。举例来说,你有一个网站。

<code>  http://example.com   </code>
Copy after login

用户通过井号结构的URL,看到不同的内容。

<code>  http://example.com#1  http://example.com#2  http://example.com#3   </code>
Copy after login

但是,搜索引擎只抓取example.com,不会理会井号,因此也就无法索引内容。

为了解决这个问题,Google提出了"井号+感叹号"的结构。

<code>  http://example.com#!1  </code>
Copy after login

当Google发现上面这样的URL,就自动抓取另一个网址:

<code>  http://example.com/?_escaped_fragment_=1  </code>
Copy after login

只要你把AJAX内容放在这个网址,Google就会收录。但是问题是,"井号+感叹号"非常难看且烦琐。Twitter曾经采用这种结构,它把

<code>  http://twitter.com/ruanyf  </code>
Copy after login

改成

<code>  http://twitter.com/#!/ruanyf  </code>
Copy after login

结果用户抱怨连连,只用了半年就废除了。

那么,有没有什么方法,可以在保持比较直观的URL的同时,还让搜索引擎能够抓取AJAX内容?

我一直以为没有办法做到,直到前两天看到了Discourse创始人之一的Robin Ward的解决方法,不禁拍案叫绝。

Discourse是一个论坛程序,严重依赖Ajax,但是又必须让Google收录内容。它的解决方法就是放弃井号结构,采用 History API。

所谓 History API,指的是不刷新页面的情况下,改变浏览器地址栏显示的URL(准确说,是改变网页的当前状态)。这里有一个例子,你点击上方的按钮,开始播放音乐。然后,再点击下面的链接,看看发生了什么事?

地址栏的URL变了,但是音乐播放没有中断!

History API 的详细介绍,超出这篇文章的范围。这里只简单说,它的作用就是在浏览器的History对象中,添加一条记录。

<code>  window.history.pushState(state object, title, url);  </code>
Copy after login

上面这行命令,可以让地址栏出现新的URL。History对象的pushState方法接受三个参数,新的URL就是第三个参数,前两个参数都可以是null。

<code>  window.history.pushState(null, null, newURL);   </code>
Copy after login

目前,各大浏览器都支持这个方法:Chrome(26.0+),Firefox(20.0+),IE(10.0+),Safari(5.1+),Opera(12.1+)。

下面就是Robin Ward的方法。

首先,用History API替代井号结构,让每个井号都变成正常路径的URL,这样搜索引擎就会抓取每一个网页。

<code>  example.com/1  example.com/2  example.com/3  </code>
Copy after login

然后,定义一个JavaScript函数,处理Ajax部分,根据网址抓取内容(假定使用jQuery)。

<code>function anchorClick(link) {<br>    var linkSplit = link.split('/').pop();<br>    $.get('api/' + linkSplit, function(data) {<br>      $('#content').html(data);<br>    });<br>  }</code>
Copy after login

再定义鼠标的click事件。

<code>  $('#container').on('click', 'a', function(e) {<br>    window.history.pushState(null, null, $(this).attr('href'));<br>    anchorClick($(this).attr('href'));<br>    e.preventDefault();<br>  });  </code>
Copy after login

还要考虑到用户点击浏览器的"前进 / 后退"按钮。这时会触发History对象的popstate事件。

<code>  window.addEventListener('popstate', function(e) {     <br>    anchorClick(location.pathname);  <br>   });</code>
Copy after login

定义完上面三段代码,就能在不刷新页面的情况下,显示正常路径URL和AJAX内容。

最后,设置服务器端。

因为不使用井号结构,每个URL都是一个不同的请求。所以,要求服务器端对所有这些请求,都返回如下结构的网页,防止出现404错误。

<code>  <br>    <br>      <section id="container"></section><br>      <noscript>
<br>        ... ...<br>       </noscript>
<br>    <br>  </code>
Copy after login

仔细看上面这段代码,你会发现有一个noscript标签,这就是奥妙所在。

我们把所有要让搜索引擎收录的内容,都放在noscript标签之中。这样的话,用户依然可以执行AJAX操作,不用刷新页面,但是搜索引擎会收录每个网页的主要内容!

怎让百度搜索引擎抓取我的网站内容?

如果你是新建的站点,百度收录是比较慢的。另外你可以到一些其他的网站上做推广,在“宏建双薪”做一个锚链接,链接地址直接指向你的网站,也就是反向链接的问题!
然后就是等待了……
一般都是google收录比较快,google收录后估计百度就快了!
 

怎不让搜索引擎抓取你的网站信息(摘抄)

首先是在你的网站跟目录下建立个robots.txt文件。什么是robots呢,就是:搜索引擎使用spider程序自动访问互联网上的网页并获取网页信息。spider在访问一个网站时,会首先会检查该网站的根域下是否有一个叫做robots.txt的纯文本文件,这个文件用于指定spider在您网站上的抓取范围。您可以在您的网站中创建一个robots.txt,在文件中声明该网站中不想被搜索引擎收录的部分或者指定搜索引擎只收录特定的部分。仅当您的网站包含不希望被搜索引擎收录的内容时,才需要使用robots.txt文件。如果您希望搜索引擎收录网站上所有内容,请勿建立robots.txt文件 可能你在建立好robots.txt文件后发现自己的网站内容还是会被搜索出来,但您的网页上的内容不会被抓取、建入索引和显示,百度搜索结果中展示的仅是其他网站对您相关网页的描述。禁止搜索引擎在搜索结果中显示网页快照,而只对网页建索引的方法为要防止所有搜索引擎显示您网站的快照,请将此元标记置入网页的

部分:要允许其他搜索引擎显示快照,但仅防止百度显示,请使用以下标记:robots.txt文件的格式"robots.txt"文件包含一条或更多的记录,这些记录通过空行分开(以CR,CR/NL, or NL作为结束符),每一条记录的格式如下所示:":"。在该文件中可以使用#进行注解,具体使用方法和UNIX中的惯例一样。该文件中的记录通常以一行或多行User-agent开始,后面加上若干Disallow和Allow行,详细情况如下:User-agent:该项的值用于描述搜索引擎robot的名字。在"robots.txt"文件中,如果有多条User-agent记录说明有多个robot会受到"robots.txt"的限制,对该文件来说,至少要有一条User-agent记录。如果该项的值设为*,则对任何robot均有效,在"robots.txt"文件中,"User-agent:*"这样的记录只能有一条。如果在"robots.txt"文件中,加入"User-agent:SomeBot"和若干Disallow、Allow行,那么名为"SomeBot"只受到"User-agent:SomeBot"后面的Disallow和Allow行的限制。Disallow:该项的值用于描述不希望被访问的一组URL,这个值可以是一条完整的路径,也可以是路径的非空前缀,以Disallow项的值开头的URL不会被robot访问。例如"Disallow:/help"禁止robot访问/help.html、/helpabc.html、/help/index.html,而"Disallow:/help&......余下全文>>
 
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Baidu cloud disk search engine entrance Baidu cloud disk search engine entrance Feb 27, 2024 pm 01:00 PM

Baidu Cloud is a software that allows users to store many files. So what is the entrance to Baidu Cloud Disk search engine? Users can enter the URL https://pan.baidu.com to enter Baidu Cloud Disk. This sharing of the latest entrance to Baidu Cloud Disk search engine will give you a detailed introduction. The following is a detailed introduction. Take a look. . Baidu cloud disk search engine entrance 1. Qianfan search website: https://pan.qianfan.app Supports network disk: aggregate search, Alibaba, Baidu, Quark, Lanzuo, Tianyi, Xunlei network disk viewing method: login required, follow the company Advantages of obtaining the activation code: The network disk is comprehensive, there are many resources, and the interface is simple. 2. Maolipansou website: alipansou.c

Scrapy case analysis: How to crawl company information on LinkedIn Scrapy case analysis: How to crawl company information on LinkedIn Jun 23, 2023 am 10:04 AM

Scrapy is a Python-based crawler framework that can quickly and easily obtain relevant information on the Internet. In this article, we will use a Scrapy case to analyze in detail how to crawl company information on LinkedIn. Determine the target URL First, we need to make it clear that our target is the company information on LinkedIn. Therefore, we need to find the URL of the LinkedIn company information page. Open the LinkedIn website, enter the company name in the search box, and

How to change search engines on iPhone and iPad How to change search engines on iPhone and iPad Apr 25, 2023 am 08:28 AM

It's easy to change the search engine in Safari, Google Chrome, or other browsers on your iPhone or iPad. This tutorial will show you how to do it on four different web browsers available on iPhone and iPad. How to Change the Safari Search Engine on iPhone or iPad Safari is the default web browser on iOS and iPadOS, but you might not like the search engine. Fortunately, you can use these steps to change it: On your iPhone or iPad, launch Settings from the Home screen. Swipe down and tap Safari from the list. In the next menu,

Java development: How to implement search engine and full-text retrieval functions Java development: How to implement search engine and full-text retrieval functions Sep 21, 2023 pm 01:10 PM

Java development: How to implement search engine and full-text retrieval functions, specific code examples are required Search engines and full-text retrieval are important functions in the modern Internet era. Not only do they help users find what they want quickly, they also provide a better user experience for websites and apps. This article will introduce how to use Java to develop search engines and full-text retrieval functions, and provide some specific code examples. Full-text search using Lucene library Lucene is an open source full-text search engine library, developed by ApacheSo

In the field of artificial intelligence search, Google and Microsoft compete In the field of artificial intelligence search, Google and Microsoft compete Apr 08, 2023 am 11:31 AM

Since its launch late last year, ChatGPT has been seen as a major threat to traditional ways of searching for information. Because it is diverse, you can answer people's questions, write essays or poems, or even write program code. The ability of conversational AI to provide coherent answers is considered a threat to Google's search engine, which for decades has been the benchmark platform for people to search for information on the Internet. OpenAI’s ChatGPT can tailor answers to specific questions asked by users, which can save time browsing websites. A report published by The New York Times in December revealed that ChatGPT’s overnight success forced Google to call it “Code Red” and begin addressing the threat posed by artificial intelligence chatbots to its search engine business. according to

PHP search engine performance optimization: Algolia's magic trick PHP search engine performance optimization: Algolia's magic trick Jul 23, 2023 pm 04:21 PM

PHP Search Engine Performance Optimization: Algolia’s Magical Way With the development of the Internet and the increasing user requirements for search experience, search engine performance optimization has become crucial. In the world of PHP development, Algolia is a powerful and easy-to-integrate search engine service. This article will introduce the magical uses of Algolia and how to optimize the performance of PHP search engines through Algolia. Algolia introduction Algolia is a search engine service provider based on SaaS model.

How to use Google Chrome search engine How to use Google Chrome search engine Jan 04, 2024 am 11:15 AM

Google Chrome is very good. There are many friends who use it. Many friends want to use Google’s own search engine, but don’t know how to use it. Here is a quick look at how to use Google Chrome’s Google search engine. Bar. How to use the Google search engine in Google Chrome: 1. Open Google Chrome and click More in the upper right corner to open settings. 2. After entering settings, click "Search Engine" on the left. 3. Check whether your search engine is "Google". 4. If not, you can click the drop-down button and change it to "Google".

Example of scraping Instagram information using PHP Example of scraping Instagram information using PHP Jun 13, 2023 pm 06:26 PM

Instagram is one of the most popular social media today, with hundreds of millions of active users. Users upload billions of pictures and videos, and this data is very valuable to many businesses and individuals. Therefore, in many cases, it is necessary to use a program to automatically scrape Instagram data. This article will introduce how to use PHP to capture Instagram data and provide implementation examples. Install the cURL extension for PHP cURL is a tool used in various

See all articles