Because it can do things that it has not been trained on, large language models seem to have some kind of magic, and therefore have become the focus of hype and attention from the media and researchers.
When expanding a large language model, occasionally some new capabilities will appear that are not available in smaller models. This attribute similar to "creativity" is called "emergent" capability, which represents We have taken a giant step towards general artificial intelligence.
Now, researchers from Google, Stanford, Deepmind and the University of North Carolina are exploring the "emergent" ability in large language models.
DALL-E prompted by the decoder
Natural language processing (NLP) has been revolutionized by language models trained on large amounts of text data. Scaling up language models often improves performance and sample efficiency on a range of downstream NLP tasks.
In many cases, we can predict the performance of a large language model by extrapolating the performance trends of smaller models. For example, the effect of scale on language model perplexity has been demonstrated across more than seven orders of magnitude.
However, performance on some other tasks did not improve in a predictable way.
For example, the GPT-3 paper shows that the language model's ability to perform multi-digit addition has a flat scaling curve for models from 100M to 13B parameters, is approximately random, but will decrease in One node causes a performance jump.
#Given the increasing use of language models in NLP research, it is important to better understand these capabilities that may arise unexpectedly.
In a recent paper "Emergent Power of Large Language Models" published in Machine Learning Research (TMLR), researchers demonstrated the "emergent power" produced by dozens of extended language models. ”Examples of abilities.
The existence of this "emergent" capability raises the question of whether additional scaling can further expand the range of capabilities of language models.
Certain tips and fine-tuning methods will only produce improvements in larger models
First, we discuss the "emergent" abilities that may appear in the prompt task.
In this type of task, a pre-trained language model is prompted to perform the task of next word prediction and performs the task by completing the response.
Without any further fine-tuning, language models can often perform tasks not seen during training.
#We call a task an "emergent" task when it unpredictably surges from random to above-random performance at a specific scale threshold. .
Below we present three examples of prompted tasks with "emergent" performance: multi-step arithmetic, taking a college-level exam, and identifying the intended meaning of a word.
In each case, language models perform poorly, with little dependence on model size, until a certain threshold is reached - where their performance spikes.
For models of sufficient scale, performance on these tasks only becomes non-random - for example, training floating point operations per second for arithmetic and multi-task NLU tasks ( FLOP) exceeds 10 to the 22nd power, and the training FLOP of words in the context task exceeds 10 to the 24th power.
The second category of "emergent" capabilities includes prompt strategies that enhance language model capabilities.
Prompting strategies are a broad paradigm for prompting that can be applied to a range of different tasks. They are considered "emergent" when they fail for small models and can only be used by sufficiently large models.
Thought chain prompts are a typical example of the "emergent" prompt strategy, where the prompt model generates a series of intermediate steps before giving the final answer.
Thought chain prompts enable language models to perform tasks that require complex reasoning, such as multi-step math word problems.
It is worth mentioning that the model can acquire the ability of thought chain reasoning without explicit training. The figure below shows an example of a thought chain prompt.
#The empirical results of the thinking chain prompt are as follows.
For smaller models, applying the thought chain prompt is no better than the standard prompt, for example when applied to GSM8K, which is a Challenging math word problem benchmark.
However, for large models, Thought Chain prompts achieved a 57% solution rate on GSM8K, significantly improving performance in our tests.
So what is the significance of studying "emergent" abilities?
Identifying “emergent” capabilities in large language models is the first step in understanding this phenomenon and its potential impact on future model capabilities.
For example, because “emergent” small-shot hinting capabilities and strategies are not explicitly encoded in pre-training, researchers may not know the full range of small-shot hinting capabilities of current language models.
In addition, the question of whether further expansion will potentially give larger models "emergent" capabilities is also very important.
Researchers say these questions are not yet known.
However, as the field of NLP continues to develop, it is very important to analyze and understand the behavior of language models, including the "emergent" capabilities produced by scaling.
The above is the detailed content of The parameters are slightly improved, and the performance index explodes! Google: Large language models hide 'mysterious skills”. For more information, please follow other related articles on the PHP Chinese website!