如何让搜索引擎抓取AJAX内容解决方案,抓取ajax
如何让搜索引擎抓取AJAX内容解决方案,抓取ajax
越来越多的网站,开始采用"单页面结构"(Single-page application)。
整个网站只有一张网页,采用Ajax技术,根据用户的输入,加载不同的内容。
这种做法的好处是用户体验好、节省流量,缺点是AJAX内容无法被搜索引擎抓取。举例来说,你有一个网站。
<code> http://example.com </code>
用户通过井号结构的URL,看到不同的内容。
<code> http://example.com#1 http://example.com#2 http://example.com#3 </code>
但是,搜索引擎只抓取example.com,不会理会井号,因此也就无法索引内容。
为了解决这个问题,Google提出了"井号+感叹号"的结构。
<code> http://example.com#!1 </code>
当Google发现上面这样的URL,就自动抓取另一个网址:
<code> http://example.com/?_escaped_fragment_=1 </code>
只要你把AJAX内容放在这个网址,Google就会收录。但是问题是,"井号+感叹号"非常难看且烦琐。Twitter曾经采用这种结构,它把
<code> http://twitter.com/ruanyf </code>
改成
<code> http://twitter.com/#!/ruanyf </code>
结果用户抱怨连连,只用了半年就废除了。
那么,有没有什么方法,可以在保持比较直观的URL的同时,还让搜索引擎能够抓取AJAX内容?
我一直以为没有办法做到,直到前两天看到了Discourse创始人之一的Robin Ward的解决方法,不禁拍案叫绝。
Discourse是一个论坛程序,严重依赖Ajax,但是又必须让Google收录内容。它的解决方法就是放弃井号结构,采用 History API。
所谓 History API,指的是不刷新页面的情况下,改变浏览器地址栏显示的URL(准确说,是改变网页的当前状态)。这里有一个例子,你点击上方的按钮,开始播放音乐。然后,再点击下面的链接,看看发生了什么事?
地址栏的URL变了,但是音乐播放没有中断!
History API 的详细介绍,超出这篇文章的范围。这里只简单说,它的作用就是在浏览器的History对象中,添加一条记录。
<code> window.history.pushState(state object, title, url); </code>
上面这行命令,可以让地址栏出现新的URL。History对象的pushState方法接受三个参数,新的URL就是第三个参数,前两个参数都可以是null。
<code> window.history.pushState(null, null, newURL); </code>
目前,各大浏览器都支持这个方法:Chrome(26.0+),Firefox(20.0+),IE(10.0+),Safari(5.1+),Opera(12.1+)。
下面就是Robin Ward的方法。
首先,用History API替代井号结构,让每个井号都变成正常路径的URL,这样搜索引擎就会抓取每一个网页。
<code> example.com/1 example.com/2 example.com/3 </code>
然后,定义一个JavaScript函数,处理Ajax部分,根据网址抓取内容(假定使用jQuery)。
<code>function anchorClick(link) {<br> var linkSplit = link.split('/').pop();<br> $.get('api/' + linkSplit, function(data) {<br> $('#content').html(data);<br> });<br> }</code>
再定义鼠标的click事件。
<code> $('#container').on('click', 'a', function(e) {<br> window.history.pushState(null, null, $(this).attr('href'));<br> anchorClick($(this).attr('href'));<br> e.preventDefault();<br> }); </code>
还要考虑到用户点击浏览器的"前进 / 后退"按钮。这时会触发History对象的popstate事件。
<code> window.addEventListener('popstate', function(e) { <br> anchorClick(location.pathname); <br> });</code>
定义完上面三段代码,就能在不刷新页面的情况下,显示正常路径URL和AJAX内容。
最后,设置服务器端。
因为不使用井号结构,每个URL都是一个不同的请求。所以,要求服务器端对所有这些请求,都返回如下结构的网页,防止出现404错误。
<code> <br> <br> <section id="container"></section><br> <noscript> <br> ... ...<br> </noscript> <br> <br> </code>
仔细看上面这段代码,你会发现有一个noscript标签,这就是奥妙所在。
我们把所有要让搜索引擎收录的内容,都放在noscript标签之中。这样的话,用户依然可以执行AJAX操作,不用刷新页面,但是搜索引擎会收录每个网页的主要内容!
如果你是新建的站点,百度收录是比较慢的。另外你可以到一些其他的网站上做推广,在“宏建双薪”做一个锚链接,链接地址直接指向你的网站,也就是反向链接的问题!
然后就是等待了……
一般都是google收录比较快,google收录后估计百度就快了!
首先是在你的网站跟目录下建立个robots.txt文件。什么是robots呢,就是:搜索引擎使用spider程序自动访问互联网上的网页并获取网页信息。spider在访问一个网站时,会首先会检查该网站的根域下是否有一个叫做robots.txt的纯文本文件,这个文件用于指定spider在您网站上的抓取范围。您可以在您的网站中创建一个robots.txt,在文件中声明该网站中不想被搜索引擎收录的部分或者指定搜索引擎只收录特定的部分。仅当您的网站包含不希望被搜索引擎收录的内容时,才需要使用robots.txt文件。如果您希望搜索引擎收录网站上所有内容,请勿建立robots.txt文件 可能你在建立好robots.txt文件后发现自己的网站内容还是会被搜索出来,但您的网页上的内容不会被抓取、建入索引和显示,百度搜索结果中展示的仅是其他网站对您相关网页的描述。禁止搜索引擎在搜索结果中显示网页快照,而只对网页建索引的方法为要防止所有搜索引擎显示您网站的快照,请将此元标记置入网页的
部分:要允许其他搜索引擎显示快照,但仅防止百度显示,请使用以下标记:robots.txt文件的格式"robots.txt"文件包含一条或更多的记录,这些记录通过空行分开(以CR,CR/NL, or NL作为结束符),每一条记录的格式如下所示:"
Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Baidu Cloud is a software that allows users to store many files. So what is the entrance to Baidu Cloud Disk search engine? Users can enter the URL https://pan.baidu.com to enter Baidu Cloud Disk. This sharing of the latest entrance to Baidu Cloud Disk search engine will give you a detailed introduction. The following is a detailed introduction. Take a look. . Baidu cloud disk search engine entrance 1. Qianfan search website: https://pan.qianfan.app Supports network disk: aggregate search, Alibaba, Baidu, Quark, Lanzuo, Tianyi, Xunlei network disk viewing method: login required, follow the company Advantages of obtaining the activation code: The network disk is comprehensive, there are many resources, and the interface is simple. 2. Maolipansou website: alipansou.c

Scrapy is a Python-based crawler framework that can quickly and easily obtain relevant information on the Internet. In this article, we will use a Scrapy case to analyze in detail how to crawl company information on LinkedIn. Determine the target URL First, we need to make it clear that our target is the company information on LinkedIn. Therefore, we need to find the URL of the LinkedIn company information page. Open the LinkedIn website, enter the company name in the search box, and

It's easy to change the search engine in Safari, Google Chrome, or other browsers on your iPhone or iPad. This tutorial will show you how to do it on four different web browsers available on iPhone and iPad. How to Change the Safari Search Engine on iPhone or iPad Safari is the default web browser on iOS and iPadOS, but you might not like the search engine. Fortunately, you can use these steps to change it: On your iPhone or iPad, launch Settings from the Home screen. Swipe down and tap Safari from the list. In the next menu,

Java development: How to implement search engine and full-text retrieval functions, specific code examples are required Search engines and full-text retrieval are important functions in the modern Internet era. Not only do they help users find what they want quickly, they also provide a better user experience for websites and apps. This article will introduce how to use Java to develop search engines and full-text retrieval functions, and provide some specific code examples. Full-text search using Lucene library Lucene is an open source full-text search engine library, developed by ApacheSo

Since its launch late last year, ChatGPT has been seen as a major threat to traditional ways of searching for information. Because it is diverse, you can answer people's questions, write essays or poems, or even write program code. The ability of conversational AI to provide coherent answers is considered a threat to Google's search engine, which for decades has been the benchmark platform for people to search for information on the Internet. OpenAI’s ChatGPT can tailor answers to specific questions asked by users, which can save time browsing websites. A report published by The New York Times in December revealed that ChatGPT’s overnight success forced Google to call it “Code Red” and begin addressing the threat posed by artificial intelligence chatbots to its search engine business. according to

PHP Search Engine Performance Optimization: Algolia’s Magical Way With the development of the Internet and the increasing user requirements for search experience, search engine performance optimization has become crucial. In the world of PHP development, Algolia is a powerful and easy-to-integrate search engine service. This article will introduce the magical uses of Algolia and how to optimize the performance of PHP search engines through Algolia. Algolia introduction Algolia is a search engine service provider based on SaaS model.

Google Chrome is very good. There are many friends who use it. Many friends want to use Google’s own search engine, but don’t know how to use it. Here is a quick look at how to use Google Chrome’s Google search engine. Bar. How to use the Google search engine in Google Chrome: 1. Open Google Chrome and click More in the upper right corner to open settings. 2. After entering settings, click "Search Engine" on the left. 3. Check whether your search engine is "Google". 4. If not, you can click the drop-down button and change it to "Google".

Instagram is one of the most popular social media today, with hundreds of millions of active users. Users upload billions of pictures and videos, and this data is very valuable to many businesses and individuals. Therefore, in many cases, it is necessary to use a program to automatically scrape Instagram data. This article will introduce how to use PHP to capture Instagram data and provide implementation examples. Install the cURL extension for PHP cURL is a tool used in various
