Home > Technology peripherals > AI > body text

2,500 pages of algorithm documents leaked! The most powerful black box in search history is exposed, will Google overturn and upgrade again?

WBOY
Release: 2024-06-02 12:21:35
Original
791 people have browsed it
Written by Noah

produced | 51CTO Technology Stack (WeChat ID: blog51cto)

Google is having a bit of a bad year.

In the past two days, search engines have provided information about the "AI Overviews" feature that often provides seriously incorrect search results information, such as absurdly suggesting that users use glue to Prevents cheese from sliding off pizza. In this regard, CEO Pichai also had to admit that this was caused by the illusion of the large language model, and there is currently no solution.

An internal document of Google search engine was recently leaked, which may show the operating mechanism of Google search engine to the public for the first time. This article was first published here Google has yet to issue an official response to the leak and has not disputed the authenticity of the documents.

The details of how Google, the most famous search engine on the Internet today, ranks websites have long been a mystery. This exposure provides a new perspective, giving us a glimpse into Google's highly confidential search algorithm system, and how its operating mechanisms complement Google's previous public statements.

1.2500 pages of leaked documents

Google’s search algorithm is perhaps the most influential system on the Internet. It determines the survival of websites and the presentation of online content. However, the specific details of how Google ranks websites have always been a "black box". Although there have been various speculations by the media, researchers, and people engaged in search engine optimization, these are just blind people trying to figure out the elephant. We never see the complete puzzle.

Now, according to foreign media The Verge, this explosive leak seems to have unveiled the mystery behind the search function for the first time, and hints that Google has not been completely honest over the years. publicly disclose how it operates. Google has so far not responded to multiple requests for comment about the authenticity of the documents.

Rand Fishkin, who has been working in SEO for more than ten years, is the protagonist of this incident. He revealed that a source shared 2,500 pages of documents with him in the hope of exposing Google’s external “lies” about how its search algorithm works.

According to Fishkin, the documents outline Google's search API and break down the information provided to employees. The details Fishkin shares are complex and technical, and may be easier for developers and SEO experts to understand than the average person.

Leaks by themselves do not necessarily prove that Google uses specific data and signals for search rankings. Instead, the leaked documents outline what data Google collects from web pages, sites and searchers, and indirectly provide SEO experts with clues about Google's focus.

2.Contradicts Google’s public statements

As SEO expert Mike King wrote in his overview of the documents, the leaked documents touch on multiple topics, such as what Google collects and uses. Types of data, how Google boosts certain sites on sensitive topics like elections, how Google handles smaller sites, and more.

More concerning, according to Fishkin and Mike King, some of the information in the document appears to contradict Google’s public statements.

"It may be too serious to say 'lying,' but in this case, it is the most appropriate term," Mike King expressed it this way: "I understand that Google's public relations people are trying to What I can’t accept is that they demean those who find and question Google in the fields of marketing, technology and journalism. has not yet responded to The Verge’s request for comment involving the documents, which included a direct request to rebut the authenticity of the documents. Fishkin said in an email to The Verge that Google did not dispute the authenticity of the leak, but that an employee asked him to change some of the wording in his post about an incident.

Google’s secretive search algorithm has spawned an industry of marketers who follow Google’s public guidelines and implement SEO strategies for millions of companies around the world. However, these widely used methods have gradually made people generally feel that Google's search results are deteriorating and full of spam information.

Website operators feel compelled to produce this type of content in order to get their sites seen. But in the face of such doubts, Google's external spokesperson will always come up with a familiar set of rhetoric: Our guidelines do not indicate this.

But some details in the leaked documents cast doubt on the accuracy of Google’s public statements about how its search feature works.

#One example cited by Fishkin and Mike King is whether Google uses Chrome data in rankings. Google representatives have repeatedly stated that Chrome data is not used to rank pages, but Chrome is specifically mentioned in a section about how sites appear in searches.

Picture

2,500 pages of algorithm documents leaked! The most powerful black box in search history is exposed, will Google overturn and upgrade again?In the screenshot above, according to the document, below the main vogue.com URL Some of the links that appear may have been created using Chrome data.

Another issue that has attracted attention is the role that E-A-T (expertise, authority and trustworthiness) plays in the rankings. As we all know, E-A-T has been the cornerstone of Google’s search quality assessment guidelines for many years.

Google representatives have previously stated that E-A-T is not a ranking factor. Fishkin noted that he didn't find many direct references to E-A-T in the documents.

Also, Google representatives have previously insisted that attribution is something website owners should do for readers, not Google, because it doesn't affect rankings. But that doesn't seem to be the case.

Mike King detailed how Google collects author data for pages, noting that there is a field in the file used to identify whether an entity is an author, although this field is mainly designed for news articles , but also covers other content such as scientific articles. While this doesn't confirm that attribution is an explicit ranking factor, it does suggest that Google is at least tracking this attribute closely.

3. Search algorithm innovation, the Internet ecosystem has "changed" since then

Although these documents are not conclusive evidence, they provide an in-depth and unfiltered The perspective allows us to get a glimpse of this highly confidential black box system.

In fact, in the past two years, Google search has experienced a series of major updates, some of which are even unprecedented disruptive updates. For example, mentioned at the beginning of this article, the much-criticized “AI Overview” function is one of the most representative innovations.

At the beginning of the change, Pichai, the head of Google, said that in the future, Google search will provide self-generated AI answers to many of your questions, and expressed strong support for this product function. confidence.

A Google spokesperson told the BBC that the company will only roll out search changes after rigorous testing to confirm that the changes will benefit users, and that the company provides help to website owners. , resources and the opportunity for feedback on their search rankings.

But reality always deviates from the ideal.

Whether it is the "fatal hallucination" about the AI ​​overview function or the "inconsistent" information conveyed in this suspected leaked document, it is causing people to have doubts about Google. Search with suspicion and vigilance.

Looking back at the entire history of the development of the Internet, no company has changed the way most people on this blue star obtain information like Google, but has also reshaped the way content is created and distributed. pattern.

Using generative AI to support search as an example, Google seems to be aiming to connect users and information more efficiently through these technological innovations and improve the overall quality of the search experience.

But in fact, as critics say, this shift may exacerbate information homogeneity and reduce the depth and breadth of users exploring the web as they increasingly rely on Google to directly The short answer provided instead of visiting the source website yourself. This may not only weaken the visibility and profit model of independent websites and blogs, but may also affect the health and diversity of the online ecosystem, limiting users’ opportunities for exposure to diverse viewpoints and in-depth analysis.

For search players as powerful as Google, perhaps the only way to ensure that search algorithm optimization can not only serve the public but not destroy the ecological cornerstones that contribute high-quality content to the Internet is It is the foundation for long-term development.

Reference link:

https://www.theverge.com/2024/5/28/24166177/google-search-ranking-algorithm-leak-documents -link-seo

https://www.php.cn/link/c30ca4400db3c72274c8ad819f688c21

To learn more about AIGC, please visit:

51CTO AI.x Community

https://www.51cto.com/aigc/

The above is the detailed content of 2,500 pages of algorithm documents leaked! The most powerful black box in search history is exposed, will Google overturn and upgrade again?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template