


Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns
From ChatGPT to AI drawing technology, this recent wave of breakthroughs in the field of artificial intelligence may be thanks to Transformer.
Today is the sixth anniversary of the submission of the famous transformer paper.
## Paper link: https://arxiv.org/abs/1706.03762
Six years ago, a paper with a somewhat exaggerated name was uploaded to the preprint paper platform arXiv. The phrase "xx is All You Need" was repeatedly repeated by developers in the AI field. , has even become a trend in paper titles, and Transformer no longer means Transformers, it now represents the most advanced technology in the field of AI.
Six years later, looking back at this paper, we can find many interesting or little-known aspects, as NVIDIA AI scientist Jim Fan summarized.
The Transformer model abandons the tradition Of CNN and RNN units, the entire network structure is entirely composed of attention mechanisms.
Although the name of the Transformer paper is "Attention is All You Need", and we continue to praise the attention mechanism because of it, please note an interesting fact: it is not Transformer's research Researchers invented attention, but they pushed this mechanism to the extreme.
Attention Mechanism was proposed in 2014 by a team led by deep learning pioneer Yoshua Bengio:
"Neural Machine Translation by Jointly Learning to Align and Translate", the title is relatively simple.
In this ICLR 2015 paper, Bengio et al. proposed a combination of RNN "context vectors" (i.e., attention). Although it is one of the greatest milestones in the field of NLP, it is much less well-known than transformer. The Bengio team's paper has been cited 29,000 times so far, and Transformer has 77,000 times.
AI’s attention mechanism is naturally modeled after human visual attention. The human brain has an innate ability: when we look at a picture, we first quickly scan the picture and then focus on the target area that needs to be focused on.
If you don’t let go of any local information, you will inevitably do a lot of useless work, which is not conducive to survival. Likewise, introducing similar mechanisms into deep learning networks can simplify models and speed up calculations. In essence, Attention is to filter out a small amount of important information from a large amount of information, and focus on this important information, while ignoring most of the unimportant information.
In recent years, attention mechanisms have been widely used in various fields of deep learning, such as for capturing receptive fields on images in the direction of computer vision, or for locating key tokens in NLP. or characteristics. A large number of experiments have proven that models with attention mechanisms have achieved significant performance improvements in tasks such as image classification, segmentation, tracking, and enhancement, as well as natural language recognition, understanding, question answering, and translation.
The Transformer model that introduces the attention mechanism can be regarded as a general-purpose sequence computer. The attention mechanism allows the model to assign different assignments based on the correlation of different positions in the sequence when processing the input sequence. The attention weight allows the Transformer to capture long-distance dependencies and contextual information, thereby improving the effect of sequence processing.
But at that time, neither Transformer nor the original attention paper talked about universal sequence computers. Instead, the authors see it as a mechanism for solving a narrow and specific problem—machine translation. So in the future, when we trace the origin of AGI, we may be able to trace it back to the "humble" Google Translate.
Although it was accepted by NeurIPS 2017, it didn’t even get an Oral
Transformer Although this paper is very influential now, it was not the top AI in the world that year At the conference NeurIPS 2017, I didn’t even get an Oral, let alone an award. The conference received a total of 3240 paper submissions that year, of which 678 were selected as conference papers. The Transformer paper was one of the accepted papers. Among these papers, 40 were Oral papers, 112 were Spotlight papers, and 3 were the best. Thesis, a Test of time award, Transformer is not eligible for the award.
Although it missed the NeurIPS 2017 paper award, the influence of Transformer is obvious to all.
Jim Fan commented: It is not the fault of the judges that it is difficult for people to realize the importance of an influential study before it becomes influential. However, there are also papers that are lucky enough to be discovered immediately. For example, ResNet proposed by He Yuming and others won the best paper of CVPR 2016. This research is well-deserved and has been correctly recognized by the top AI conference. But at the moment in 2017, very smart researchers may not be able to predict the changes brought about by LLM now. Just like in the 1980s, few people could foresee the tsunami brought about by deep learning since 2012.
Eight authors, each with a wonderful life
There were 8 authors of this paper at that time. They were from Google and the University of Toronto. Five years later, most of them The authors of the paper have all left their original institutions.
On April 26, 2022, a company called "Adept" was officially established. There are 9 co-founders, including Ashish Vaswani, two of the authors of the Transformer paper. and Niki Parmar.
##Ashish Vaswani Obtained a PhD from the University of Southern California, where he studied under the tutelage of Chinese scholars David Chiang and Liang Huang. He mainly studied the early applications of modern deep learning in language modeling. In 2016, he joined Google Brain and led Transformer research before leaving Google in 2021.
Niki Parmar graduated from the University of Southern California with a master's degree and joined Google in 2016. While there, she developed some successful Q&A and text similarity models for Google search and ads. She led early work extending the Transformer model into areas such as image generation, computer vision, and more. In 2021, she also left Google.
After leaving, the two co-founded Adept and served as chief scientist (Ashish Vaswani) and chief technology officer (Niki Parmar) respectively. Adept’s vision is to create an AI called an “artificial intelligence teammate” that is trained to use a variety of different software tools and APIs.
In March 2023, Adept announced the completion of a US$350 million Series B financing. The company’s valuation exceeded US$1 billion, making it a unicorn. However, by the time Adept raised funds publicly, Niki Parmar and Ashish Vaswani had left Adept and founded their own new AI company. However, this new company is still confidential and we are unable to obtain detailed information about the company.
Another paper author Noam Shazeer is one of Google’s most important early employees. He joined Google at the end of 2000 until he finally left in 2021, and then became the CEO of a start-up called "Character.AI".
In addition to Noam Shazeer, the founder of Character.AI is Daniel De Freitas, both of whom are from Google’s LaMDA team. Previously, they built LaMDA, a language model that supports conversational programs at Google.
In March this year, Character.AI announced the completion of US$150 million in financing, with a valuation reaching US$1 billion. It is one of the few startups that has the potential to compete with OpenAI, the organization to which ChatGPT belongs. , is also a rare company that grew into a unicorn in only 16 months. Its application, Character.AI, is a neural language model chatbot that can generate human-like text responses and engage in contextual conversations.
Character.AI was released on the Apple App Store and Google Play Store on May 23, 2023, and was downloaded more than 1.7 million times in its first week. In May 2023, the service added a $9.99 per month paid subscription called c.ai, which allows users priority chat access, faster response times and early access to new features, among other perks.
Aidan N. Gomez Left as early as 2019 Google, then worked as a researcher at FOR.ai, and is now co-founder and CEO of Cohere.
Cohere is a generative AI startup founded in 2019. Its core business includes providing NLP models and helping enterprises improve human-computer interaction. The three founders are Ivan Zhang, Nick Frosst and Aidan Gomez, among whom Gomez and Frosst are former members of the Google Brain team. In November 2021, Google Cloud announced that they would be partnering with Cohere, with Google Cloud using its robust infrastructure to power the Cohere platform, and Cohere using Cloud's TPUs to develop and deploy its products.
It is worth noting that Cohere has just received US$270 million in Series C financing, becoming a unicorn with a market capitalization of US$2.2 billion.
Łukasz KaiserLeaving Google in 2021 to work at Google I have been working for 7 years and 9 months, and now I am a researcher at OpenAI. While working as a research scientist at Google, he participated in the design of SOTA neural models for machine translation, parsing, and other algorithmic and generation tasks. He was a co-author of the TensorFlow system and the Tensor2Tensor library.
Jakob Uszkoreit left Google in 2021 to work at Google After 13 years, he joined Inceptive and became a co-founder. Inceptive is an AI pharmaceutical company dedicated to using deep learning to design RNA drugs.
While working at Google, Jakob Uszkoreit participated in forming the language understanding team of Google Assistant, and also worked on Google Translate in the early days.
Illia Polosukhin left Google in 2017 and is now NEAR Co-founder and CTO of .AI (a blockchain underlying technology company).
The only one who is still at Google is Llion Jones, this year This is his 9th year at Google.
Now, 6 years have passed since the publication of the paper "Attention Is All You Need", and the original authors have Choose to leave, some choose to stay at Google, no matter what, Transformer's influence continues.
The above is the detailed content of Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information

In Debian systems, the readdir function is used to read directory contents, but the order in which it returns is not predefined. To sort files in a directory, you need to read all files first, and then sort them using the qsort function. The following code demonstrates how to sort directory files using readdir and qsort in Debian system: #include#include#include#include#include//Custom comparison function, used for qsortintcompare(constvoid*a,constvoid*b){returnstrcmp(*(

Configuring a Debian mail server's firewall is an important step in ensuring server security. The following are several commonly used firewall configuration methods, including the use of iptables and firewalld. Use iptables to configure firewall to install iptables (if not already installed): sudoapt-getupdatesudoapt-getinstalliptablesView current iptables rules: sudoiptables-L configuration

This article describes how to adjust the logging level of the ApacheWeb server in the Debian system. By modifying the configuration file, you can control the verbose level of log information recorded by Apache. Method 1: Modify the main configuration file to locate the configuration file: The configuration file of Apache2.x is usually located in the /etc/apache2/ directory. The file name may be apache2.conf or httpd.conf, depending on your installation method. Edit configuration file: Open configuration file with root permissions using a text editor (such as nano): sudonano/etc/apache2/apache2.conf

In Debian systems, OpenSSL is an important library for encryption, decryption and certificate management. To prevent a man-in-the-middle attack (MITM), the following measures can be taken: Use HTTPS: Ensure that all network requests use the HTTPS protocol instead of HTTP. HTTPS uses TLS (Transport Layer Security Protocol) to encrypt communication data to ensure that the data is not stolen or tampered during transmission. Verify server certificate: Manually verify the server certificate on the client to ensure it is trustworthy. The server can be manually verified through the delegate method of URLSession

The steps to install an SSL certificate on the Debian mail server are as follows: 1. Install the OpenSSL toolkit First, make sure that the OpenSSL toolkit is already installed on your system. If not installed, you can use the following command to install: sudoapt-getupdatesudoapt-getinstallopenssl2. Generate private key and certificate request Next, use OpenSSL to generate a 2048-bit RSA private key and a certificate request (CSR): openss

The readdir function in the Debian system is a system call used to read directory contents and is often used in C programming. This article will explain how to integrate readdir with other tools to enhance its functionality. Method 1: Combining C language program and pipeline First, write a C program to call the readdir function and output the result: #include#include#include#includeintmain(intargc,char*argv[]){DIR*dir;structdirent*entry;if(argc!=2){

Managing Hadoop logs on Debian, you can follow the following steps and best practices: Log Aggregation Enable log aggregation: Set yarn.log-aggregation-enable to true in the yarn-site.xml file to enable log aggregation. Configure log retention policy: Set yarn.log-aggregation.retain-seconds to define the retention time of the log, such as 172800 seconds (2 days). Specify log storage path: via yarn.n
