ChatGPT’s amazing performance in few-shot and zero-shot scenarios has made researchers more determined that “pre-training” is the right route.
Pretrained Foundation Models (PFM) are considered to be the basis for various downstream tasks under different data modes, that is, based on large-scale data, BERT, GPT-3, Pre-trained basic models such as MAE, DALLE-E and ChatGPT are trained to provide reasonable parameter initialization for downstream applications.
The pre-training idea behind PFM plays an important role in the application of large models. It is different from the previous use of convolution and The recursive module uses different methods for feature extraction. The generative pre-training (GPT) method uses Transformer as a feature extractor to perform autoregressive training on large data sets.
As PFM has achieved great success in various fields, a large number of methods, data sets and evaluation indicators have been proposed in papers published in recent years. The industry needs a paper starting from BERT. A comprehensive review tracking the development process of ChatGPT.
Recently, researchers from Beihang University, Michigan State University, Lehigh University, Nanyang Institute of Technology, Duke and many other well-known domestic and foreign universities and companies jointly wrote an article on pre-prediction This review of training basic models provides recent research progress in the fields of text, images, and graphs, as well as current and future challenges and opportunities.
Paper link: https://arxiv.org/pdf/2302.09419.pdf
Research We first review the basic components and existing pre-training of natural language processing, computer vision, and graph learning; then discuss other advanced PFM for other data models and unified PFM considering data quality and quantity; and the basic principles of PFM. Related research, including model efficiency and compression, security and privacy; finally, the article lists several key conclusions, including future research directions, challenges and open issues.
Pre-trained basic models (PFMs) are an important part of building artificial intelligence systems in the big data era. The three major artificial intelligence fields of natural language processing (NLP), computer vision (CV) and graph learning (GL) have been widely researched and applied.
PFMs are general models that are effective within various fields or in cross-domain tasks, showing great potential in learning feature representations in various learning tasks, such as text classification, Text generation, image classification, object detection and graph classification, etc.
PFMs show excellent performance in training multiple tasks with large-scale corpora and fine-tuning similar small-scale tasks, making it possible to initiate rapid data processing.
PFMs are based on pre-training technology, which aims to use a large amount of data and tasks to train a general model , which can be easily fine-tuned in different downstream applications.
The idea of pre-training originated from transfer learning in CV tasks. After realizing the effectiveness of pre-training in the CV field, people began to use pre-training techniques to improve models in other fields. performance. When pre-training techniques are applied to the NLP field, well-trained language models (LMs) can capture rich knowledge that is beneficial to downstream tasks, such as long-term dependencies, hierarchical relationships, etc.
In addition, the significant advantage of pre-training in the field of NLP is that the training data can come from any unlabeled text corpus, that is to say, there is an unlimited amount of training in the pre-training process data.
Early pre-training was a static method, such as NNLM and Word2vec, which was difficult to adapt to different semantic environments; later researchers proposed dynamic pre-training technologies, such as BERT and XLNet wait.
The history and evolution of PFMs in the fields of NLP, CV and GL
Based on pre-training technology PFMs use large corpora to learn general semantic representations. With the introduction of these pioneering works, various PFMs have emerged and been applied to downstream tasks and applications.
An obvious PFM application case is the recently popular ChatGPT.
ChatGPT is a generative pre-trained Transformer, that is, GPT-3.5 after training on a mixed corpus of text and code. Obtained by fine-tuning; ChatGPT uses reinforcement learning from human feedback (RLHF) technology, which is currently the most promising method for matching large LM with human intentions.
The superior performance of ChatGPT may lead to a critical point in the transformation of the training paradigm of each type of PFMs, that is, the application of instruction aligning technology, including reinforcement learning (RL), prompt tuning and chain-of-thought, and ultimately toward general artificial intelligence.
In this article, researchers mainly review PFM related to text, images and graphs, which is also a relatively mature research classification method.
For text, language models can achieve a variety of tasks by predicting the next word or character, for example, PFMs can be used for machine translation, question answering systems, topic modeling, sentiment analysis, etc.
For images, similar to PFMs in text, large-scale datasets are used to train a large model suitable for multiple CV tasks.
For graphs, similar pre-training ideas are also used to obtain PFMs, which can be used for many downstream tasks.
In addition to PFMs for specific data domains, this article also reviews and explains some other advanced PFMs, such as PFMs for voice, video, and cross-domain data, and multi-modal PFMs.
In addition, a large fusion trend of PFMs that can handle multi-modality is emerging, which is the so-called unified PFMs; researchers first defined the concept of unified PFMs, and then The most advanced unified PFMs in recent research are reviewed, including OFA, UNIFIED-IO, FLAVA, BEiT-3, etc.
Based on the characteristics of existing PFMs in these three fields, the researchers concluded that PFMs have the following two major advantages:
1 . Only minimal fine-tuning is required to improve the model's performance on downstream tasks;
2. PFMs have passed the test in terms of quality.
Rather than building a model from scratch to solve a similar problem, a better option is to apply PFMs to a task-relevant dataset.
The huge prospects of PFMs have inspired a lot of related work to focus on issues such as model efficiency, security and compression.
The characteristics of this review are:
Reference: https://arxiv.org/abs/2302.09419
The above is the detailed content of From BERT to ChatGPT, a comprehensive review of nine top research institutions including Beihang University: the 'pre-training basic model' that we have pursued together over the years. For more information, please follow other related articles on the PHP Chinese website!