


The big model review is here! One article will help you clarify the evolution history of large models of global AI giants
Xi Xiaoyao Science and Technology Talks Original
Author | Xiaoxi, Python
If you are a novice at large models, the first time you see the weird combination of the words GPT, PaLm, and LLaMA will make What do you think? If I go deeper and see weird words like BERT, BART, RoBERTa, and ELMo popping up one after another, I wonder if I, as a novice, will go crazy?
Even a veteran who has been in the small circle of NLP for a long time, with the explosive development speed of large models, may be confused and unable to keep up with this rapidly changing large model. Which martial arts is used by which faction? At this time, you may need to ask for a large model review to help! This large model review "Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond" launched by researchers from Amazon, Texas A&M University and Rice University provides us with a way to build a "family tree" This article has learned about the past, present and future of large models represented by ChatGPT, and based on the tasks, it has built a very comprehensive practical guide for large models, introduced to us the advantages and disadvantages of large models in different tasks, and finally pointed out the Current risks and challenges of the model.
Paper title:
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Paper link:https://www.php.cn/link/ f50fb34f27bd263e6be8ffcf8967ced0
Project homepage:https://www.php.cn/link/968b15768f3d19770471e9436d97913c
Family tree - the past and present life of the large model
The pursuit of the "source of all evil" of large models should probably start with the article "Attention is All You Need". Based on this article, the machine translation model Transformer composed of multiple groups of Encoder and Decoder proposed by the Google Machine Translation team begins. , the development of large models has generally followed two paths. One path is to abandon the Decoder part and only use the Encoder as a pre-training model for the encoder. Its most famous representative is the Bert family. These models began to try the "unsupervised pre-training" method to better utilize the large-scale natural language data that is easier to obtain than other data, and the "unsupervised" method is Masked Language Model (MLM), through Let Mask remove some words in the sentence, and let the model learn the ability to use context to predict the words removed by Mask. When Bert came out, it was considered a bomb in the field of NLP. At the same time, SOTA was used in many common tasks of natural language processing, such as sentiment analysis, named entity recognition, etc. Except for Bert and ALBert proposed by Google, outstanding representatives of the Bert family In addition, there are Baidu's ERNIE, Meta's RoBERTa, Microsoft's DeBERTa, etc.
It is a pity that Bert’s approach failed to break through Scale Law, and this point is determined by the main force of the current large model, that is, another path of large model development. The GPT family has truly achieved this by abandoning the Encoder part and based on the Decoder part. The success of the GPT family comes from a researcher's surprising discovery: "Expanding the size of the language model can significantly improve the ability of zero-shot (zero-shot) and small-shot (few-shot) learning." This is consistent with the Bert family based on fine-tuning. There is a big difference, and it is also the source of the magical power of today's large-scale language models. The GPT family is trained based on predicting the next word given the previous word sequence. Therefore, GPT initially appeared only as a text generation model, and the emergence of GPT-3 was a turning point in the fate of the GPT family. GPT-3 was the first It shows people the magical capabilities brought by large models beyond text generation itself, and shows the superiority of these autoregressive language models. Starting from GPT-3, the current ChatGPT, GPT-4, Bard, PaLM, and LLaMA have flourished, bringing about the current era of large models.
From merging the two branches of this family tree, you can see the early days of Word2Vec and FastText, to the early exploration of ELMo and ULFMiT in pre-training models, and then to Bert Hengkong It was a hit, but the GPT family worked quietly until the stunning debut of GPT-3. ChatGPT soared into the sky. In addition to the iteration of technology, we can also see that OpenAI silently adhered to its own technical path and eventually became the undisputed leader of LLMs. See We have seen Google’s significant theoretical contribution to the entire Encoder-Decoder model architecture, Meta’s continued generous participation in large model open source projects, and of course the trend of LLMs gradually moving towards “closed” source since GPT-3. It is very likely that in the future most research will have to become API-Based research.
Data - the source of power of large models
In the final analysis, does the magical ability of large models come from GPT? I think the answer is no. Almost every leap in capabilities of the GPT family has made important improvements in the quantity, quality, and diversity of pre-training data. The training data of the large model includes books, articles, website information, code information, etc. The purpose of inputting these data into the large model is to fully and accurately reflect the "human being" by telling the large model words, grammar, syntax and Semantic information allows the model to gain the ability to recognize context and generate coherent responses to capture aspects of human knowledge, language, culture, etc.
Generally speaking, in the face of many NLP tasks, we can classify them into zero samples, few samples and multiple samples from the perspective of data annotation information. Undoubtedly, LLMs are the most appropriate method for zero-shot tasks. With almost no exceptions, large models are far ahead of other models on zero-shot tasks. At the same time, few-sample tasks are also very suitable for the application of large models. By displaying "question-answer" pairs for large models, the performance of large models can be enhanced. This approach is also generally called In-Context Learning. Although large models can also cover multi-sample tasks, fine-tuning may still be the best method. Of course, under some constraints such as privacy and computing, large models may still be useful.
At the same time, the fine-tuned model is likely to face the problem of changes in the distribution of training data and test data. Significantly, the fine-tuned model generally performs very well on OOD data. Difference. Correspondingly, LLMs perform much better because they do not have an explicit fitting process. The typical ChatGPT reinforcement learning based on human feedback (RLHF) performs well in most out-of-distribution classification and translation tasks. It also performs well on the medical diagnostic dataset DDXPlus designed for OOD evaluation.
Practical Guide - Task-oriented Getting Started with Large Models
Many times, "Large models are good!" This assertion is followed by the question "How to use large models and when to use them?" "When faced with a specific task, should we choose fine-tuning or start using the large model without thinking? This paper summarizes a practical "decision flow" to help us determine whether to use a large model based on a series of questions such as "whether it is necessary to imitate humans", "whether reasoning capabilities are required", "whether it is multi-tasking".
From the perspective of NLP task classification:
Traditional natural language understanding
currently has a large amount of rich labeled data For many NLP tasks, fine-tuned models may still have a firm hold on the advantage. In most data sets, LLMs are inferior to fine-tuned models. Specifically:
- Text classification: In text classification, LLMs are generally Inferior to fine-tuned models;
- Sentiment analysis: In IMDB and SST tasks, the performance of large models and fine-tuned models is similar, but in tasks such as toxicity monitoring, almost all large models are worse than fine-tuned models;
- Natural language reasoning: On RTE and SNLI, fine-tuned models are better than LLMs. In data such as CB, LLMs are similar to fine-tuned models;
- Q&A: On SQuADv2, QuAC and many other data sets, The fine-tuned model has better performance, and on CoQA, LLMs perform similarly to the fine-tuned model;
- Information retrieval: LLMs have not been widely used in the field of information retrieval. The task characteristics of information retrieval make there no natural way to Large model modeling information retrieval task;
- Named entity recognition: In named entity recognition, the large model is still significantly inferior to the fine-tuned model. The performance of the fine-tuned model on CoNLL03 is almost twice that of the large model, but Named entity recognition, as a classic NLP intermediate task, is likely to be replaced by large models.
In short, for most traditional natural language understanding tasks, fine-tuned models perform better. Of course, the potential of LLMs is limited by the Prompt project that may not be fully released (in fact, the fine-tuning model has not reached the upper limit). At the same time, in some niche fields, such as Miscellaneous Text Classification, Adversarial NLI and other tasks, LLMs have stronger capabilities. The generalization ability thus leads to better performance, but for now, for maturely labeled data, fine-tuning the model may still be the optimal solution for traditional tasks.
Natural Language Generation
Compared with natural language understanding, natural language generation may be the stage for large models. The main goal of natural language generation is to create coherent, smooth, and meaningful sequences. It can usually be divided into two categories. One is tasks represented by machine translation and paragraph information summary, and the other is more open natural writing. Tasks such as writing emails, writing news, creating stories, etc. Specifically:
- Text summary: For text summary, if traditional automatic evaluation indicators such as ROUGE are used, LLMs do not show obvious advantages, but if manual evaluation results are introduced, The performance of LLMs will be significantly better than that of fine-tuned models. This actually shows that the current automatic evaluation indicators sometimes do not fully and accurately reflect the effect of text generation;
- Machine Translation: For a task such as machine translation with mature commercial software, the performance of LLMs is generally slightly inferior It is a commercial translation tool, but in the translation of some unpopular languages, LLMs sometimes show better results. For example, in the task of translating Romanian into English, LLMs defeated the SOTA of the fine-tuned model in the case of zero samples and few samples. ;
- Open generation: In terms of open generation, display is what large models are best at. News articles generated by LLMs are almost indistinguishable from real news written by humans. In areas such as code generation and code error correction, LLMs all show surprising performance.
Knowledge-intensive tasks
Knowledge-intensive tasks generally refer to tasks that rely strongly on background knowledge, domain-specific expertise, or general world knowledge. Knowledge-intensive tasks are different from simple patterns. Recognition and syntax analysis require "common sense" about our real world and the ability to use it correctly. Specifically:
- Closed-book Question-Answering: In the Closed-book Question-Answering task, the model is required to To answer factual questions without external information, LLMs have shown better performance on many data sets such as NaturalQuestions, WebQuestions, and TriviaQA. Especially in TriviaQA, zero-sample LLMs have shown better performance than fine-tuning. Gender performance of the model;
- Large-scale multi-task language understanding: Large-scale multi-task language understanding (MMLU) contains 57 multiple-choice questions on different topics, and also requires the model to have general knowledge. In this The most impressive task was GPT-4, which achieved 86.5% accuracy in MMLU.
It is worth noting that in knowledge-intensive tasks, large models are not always effective. Sometimes, large models may be useless or even wrong for real-world knowledge, which is "inconsistent" Knowledge can sometimes make large models perform worse than random guessing. For example, the Redefine Math task requires the model to choose between the original meaning and the redefined meaning. This requires the ability to be exactly opposite to the knowledge learned by large-scale language models. Therefore, the performance of LLMs is even worse than random. guess.
Inference tasks
The scalability of LLMs can greatly enhance the ability of pre-trained language models. When the model size increases exponentially, some key capabilities such as reasoning will gradually expand with the parameters. When activated, the arithmetic reasoning and common sense reasoning capabilities of LLMs are extremely powerful visible to the naked eye. In this type of tasks:
- Arithmetic reasoning: It is no exaggeration to say that GPT-4’s arithmetic and reasoning judgments The capabilities exceed any previous models. Large models on GSM8k, SVAMP and AQuA have breakthrough capabilities. It is worth pointing out that the computing power of LLMs can be significantly enhanced through the prompt method of chain of thought (CoT);
- Common sense reasoning: Common sense reasoning requires large models to memorize factual information and perform multi-step reasoning. In most data sets, LLMs have maintained their dominance over fine-tuned models, especially in ARC-C (third- to ninth-grade science). Among the difficult exam questions), GPT-4's performance is close to 100% (96.3%).
In addition to reasoning, as the scale of the model grows, some Emergent Abilities will also appear in the model, such as coincidence operations, logical derivation, concept understanding, etc. However, there is also an interesting phenomenon called the "U-shaped phenomenon", which refers to the phenomenon that as the scale of LLMs increases, the model performance first increases and then begins to decline. The typical representative is the problem of redefining mathematics mentioned above. This Such phenomena call for more in-depth and detailed research on the principles of large models.
Summary - Challenges and future of large models
Large models will inevitably be part of our work and life for a long time in the future, and for such a "big guy" that is highly interactive with our lives, In addition to performance, efficiency, cost and other issues, the security issue of large-scale language models is almost the top priority among all challenges faced by large models. Machine hallucination is the main problem of large models that currently has no excellent solution. , biased or harmful hallucinations output by large models will have serious consequences for users. At the same time, as the "credibility" of LLMs increases, users may become overly dependent on LLMs and believe that they can provide accurate information. This foreseeable trend increases the security risks of large models.
In addition to misleading information, due to the high quality and low cost of the text generated by LLMs, LLMs may be used as a tool for attacks such as hatred, discrimination, violence, and rumors. LLMs may also be attacked for non-malicious purposes. Attackers provide illegal information or steal privacy. According to reports, Samsung employees accidentally leaked top-secret data such as the source code attributes of the latest program and internal meeting records related to hardware while using ChatGPT to handle work.
In addition, the key to whether large models can be used in sensitive fields, such as health care, finance, law, etc., lies in the issue of the "credibility" of large models. , at present, the robustness of large models with zero samples often decreases. At the same time, LLMs have been shown to be socially biased or discriminatory, with many studies observing significant performance differences between demographic categories such as accent, religion, gender, and race. This can lead to "fairness" issues for large models.
Finally, if we break away from social issues and make a summary, we can also look into the future of large model research. The main challenges currently faced by large models can be classified as follows:
- Practical verification : Current evaluation data sets for large models are often academic data sets that are more like “toys”. However, these academic data sets cannot fully reflect the various problems and challenges in the real world. Therefore, there is an urgent need for actual data sets that are diverse and complex. Evaluate the model on real-world problems to ensure that the model can respond to real-world challenges;
- Model alignment: The power of large models also leads to another issue. The model should be aligned with human value choices to ensure that the model The behavior is in line with expectations and will not "reinforce" undesirable results. As an advanced and complex system, if this ethical issue is not dealt with seriously, it may brew a disaster for mankind;
- Safety hazards: Large model Research should further emphasize safety issues and eliminate safety hazards. Specific research is needed to ensure the safe development of large models. More work needs to be done on model interpretability, supervision and management. Safety issues should be an important part of model development, and A dispensable decoration that is not icing on the cake;
- Model future: Will the performance of the model still increase as the model size increases? , this question is estimated to be difficult for OpenAI to answer. Our understanding of the magical phenomena of large models is still very limited, and insights into the principles of large models are still very precious.
The above is the detailed content of The big model review is here! One article will help you clarify the evolution history of large models of global AI giants. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Configuring a Debian mail server's firewall is an important step in ensuring server security. The following are several commonly used firewall configuration methods, including the use of iptables and firewalld. Use iptables to configure firewall to install iptables (if not already installed): sudoapt-getupdatesudoapt-getinstalliptablesView current iptables rules: sudoiptables-L configuration

The CentOS shutdown command is shutdown, and the syntax is shutdown [Options] Time [Information]. Options include: -h Stop the system immediately; -P Turn off the power after shutdown; -r restart; -t Waiting time. Times can be specified as immediate (now), minutes ( minutes), or a specific time (hh:mm). Added information can be displayed in system messages.

Mark Cerny, chief architect of SonyInteractiveEntertainment (SIE, Sony Interactive Entertainment), has released more hardware details of next-generation host PlayStation5Pro (PS5Pro), including a performance upgraded AMDRDNA2.x architecture GPU, and a machine learning/artificial intelligence program code-named "Amethylst" with AMD. The focus of PS5Pro performance improvement is still on three pillars, including a more powerful GPU, advanced ray tracing and AI-powered PSSR super-resolution function. GPU adopts a customized AMDRDNA2 architecture, which Sony named RDNA2.x, and it has some RDNA3 architecture.

Backup and Recovery Policy of GitLab under CentOS System In order to ensure data security and recoverability, GitLab on CentOS provides a variety of backup methods. This article will introduce several common backup methods, configuration parameters and recovery processes in detail to help you establish a complete GitLab backup and recovery strategy. 1. Manual backup Use the gitlab-rakegitlab:backup:create command to execute manual backup. This command backs up key information such as GitLab repository, database, users, user groups, keys, and permissions. The default backup file is stored in the /var/opt/gitlab/backups directory. You can modify /etc/gitlab

Complete Guide to Checking HDFS Configuration in CentOS Systems This article will guide you how to effectively check the configuration and running status of HDFS on CentOS systems. The following steps will help you fully understand the setup and operation of HDFS. Verify Hadoop environment variable: First, make sure the Hadoop environment variable is set correctly. In the terminal, execute the following command to verify that Hadoop is installed and configured correctly: hadoopversion Check HDFS configuration file: The core configuration file of HDFS is located in the /etc/hadoop/conf/ directory, where core-site.xml and hdfs-site.xml are crucial. use

Zookeeper performance tuning on CentOS can start from multiple aspects, including hardware configuration, operating system optimization, configuration parameter adjustment, monitoring and maintenance, etc. Here are some specific tuning methods: SSD is recommended for hardware configuration: Since Zookeeper's data is written to disk, it is highly recommended to use SSD to improve I/O performance. Enough memory: Allocate enough memory resources to Zookeeper to avoid frequent disk read and write. Multi-core CPU: Use multi-core CPU to ensure that Zookeeper can process it in parallel.

Efficient training of PyTorch models on CentOS systems requires steps, and this article will provide detailed guides. 1. Environment preparation: Python and dependency installation: CentOS system usually preinstalls Python, but the version may be older. It is recommended to use yum or dnf to install Python 3 and upgrade pip: sudoyumupdatepython3 (or sudodnfupdatepython3), pip3install--upgradepip. CUDA and cuDNN (GPU acceleration): If you use NVIDIAGPU, you need to install CUDATool

Enable PyTorch GPU acceleration on CentOS system requires the installation of CUDA, cuDNN and GPU versions of PyTorch. The following steps will guide you through the process: CUDA and cuDNN installation determine CUDA version compatibility: Use the nvidia-smi command to view the CUDA version supported by your NVIDIA graphics card. For example, your MX450 graphics card may support CUDA11.1 or higher. Download and install CUDAToolkit: Visit the official website of NVIDIACUDAToolkit and download and install the corresponding version according to the highest CUDA version supported by your graphics card. Install cuDNN library:
