Spider traps refer to obstacles that prevent spider programs from crawling websites, such as on-site searches, e-commerce products, flash websites, restricted content, etc. The biggest characteristic of spider traps is that when a spider crawls a specific URL, it enters an infinite loop with only an entrance and no exit.
In SEO work, SEO personnel deal with content and links every day. From the current point of view, they know that independent original content is very important for future sites. The importance of long-term development, but the beginning of all this has a prerequisite, which is to avoid the "spider trap". So what is a spider trap?
What is a "Spider Trap"?
"Spider traps" are obstacles that prevent spider programs from crawling the website. Some website design techniques are very unfriendly to search engines and are not conducive to spider crawling and crawling. These techniques are called spider traps. . The biggest feature is that when the spider crawls a specific URL, it enters an infinite loop, with only entrance and no exit.
What are the common "spider traps":
1. Site search
This is a common and easy place to cause "spider traps" , when you try to search for certain keywords on the site, if a URL address like search.php?q= is crawled and included by the search engine, it is likely to produce a large number of meaningless search result pages.
Solution: You can block dynamic parameters through the Robots.txt file.
2. E-commerce products
If you have experience operating an e-commerce website in the past, then you will encounter the problem of the diversity of product SKUs. The same theme content will be displayed according to the SKU. Different URLs are generated, resulting in a large number of duplicate content pages, which also leads to a serious waste of spider crawling frequency.
Of course, there is a special "spider trap" similar to e-commerce product pages, which is dynamic content insertion, which often causes spiders to fall into gentle traps.
Solution: Make sure the URL is canonical. You can try to use the rel=canonical tag to solve similar problems.
3. Flash website
In order to satisfy the user’s visual experience, website building companies usually use Flash websites to build corporate official websites for users. This looks very beautiful, but because current search engines cannot Good crawling and identification of flash content often makes it difficult to improve site rankings.
Solution: Don’t do flash for the entire site, try to embed flash into part of the web page content.
4. Restricted content
For some sites, in order to attract fans, a lot of content can only be viewed by logging in, especially some operations that force cookies, which induces and deceives spiders. It is difficult to identify the content and it keeps trying to crawl the URL.
Solution: For website construction, try to avoid using this strategy to attract users.
How to identify "spider traps"
It is particularly easy to identify spider traps. You only need to go through the following content:
① Website log : Use the tool to read the content of the URL crawled by the spider on that day. If a special URL address is found, it deserves further attention.
② Crawl frequency: Check the crawl frequency in Baidu search resource platform. If the value is particularly large on a certain day, you are likely to fall into a spider trap.
Summary: Commonly discussed spider traps include website frames, sessionids, and various jumps. This article only briefly describes the spider traps commonly encountered in practical applications, for reference only.
The above is the detailed content of What is a spider trap?. For more information, please follow other related articles on the PHP Chinese website!