


After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go
The popularity of ChatGPT and GPT-4 has brought large-scale language models to their highlight moment so far. But where to go next?
Yann LeCun recently participated in a study that pointed out that enhancing language models may be a promising direction.
This is a review article. This article will briefly introduce the main content of the paper.
Research background
Large-scale language models have greatly promoted the progress of natural language processing, and related technologies have created several products with millions of users, including coding assistants Copilot, Google search engine and the recently popular ChatGPT. By combining memory with compositional capabilities, large language models can perform tasks such as language understanding or conditional and unconditional text generation with unprecedented performance, making higher-bandwidth human-computer interaction a reality.
However, large language models still have some limitations that prevent their wider deployment. Large language models often provide non-factual but plausible predictions, often called hallucinations. This leads to many avoidable errors, for example in arithmetic contexts or in reasoning chains. In addition, as measured by the number of trainable parameters, the breakthrough capabilities of many large language models seem to appear as the scale increases. For example, some researchers have proven that after a large language model reaches a certain scale, it can perform some tasks through few-sample prompting. BIG-bench tasks. Although a series of recent works have produced small-scale language models that still retain some characteristics of large models, the training and maintenance costs of large language models are still high due to their size and data requirements. Continuous learning of large models remains an open research problem, and Goldberg previously discussed other limitations of large language models in the context of the GPT-3-based chatbot ChatGPT.
In a recent study, researchers from Meta and other institutions analyzed that these problems stem from an essential flaw of large language models: they are usually trained to Perform statistical language modeling given (i) a single parameter model and (ii) limited context (usually n preceding or surrounding tokens). Although n has been growing due to innovations in software and hardware in recent years, most models still use relatively small contexts compared to the potentially large contexts required to consistently perform language modeling correctly. Therefore, models require huge scale to store knowledge that is not present in the context but is necessary to perform the task at hand.
Paper link: https://arxiv.org/pdf/2302.07842v1.pdf
Therefore, more and more research is aimed at solving these problems, slightly deviating from the purely statistical language modeling paradigm mentioned above.
For example, there is a work to circumvent the limited context size by increasing the relevance of large language models, by adding information extracted from relevant external documents. By equipping large language models with modules that retrieve such documents from a database for a given context, it is possible to match some of the capabilities of some of the largest language models with fewer parameters. Note that the resulting model is now non-parametric as it can query external data sources. In general, language models can also improve their context through inference strategies to generate more relevant context and save more computation before generating an answer.
Another strategy is to allow the language model to leverage external tools to augment the current context with important missing information not included in the language model weights. While much of this work aims to mitigate the language model shortcomings mentioned above, it also directly illustrates that more systematic use of inference and tools to enhance language models may lead to more powerful agents. These models are called Augmented Language Models (ALM). As this trend accelerated, the number of related studies grew dramatically, requiring the classification of works and the definition of technical terms for different uses.
The terms used in this paper are defined as follows:
reasoning. In the context of augmented language models, inference is the decomposition of a potentially complex task into simpler subtasks that the language model can more easily solve on its own or using tools. There are various ways of decomposing subtasks, such as recursively or iteratively. In this sense, reasoning is similar to "planning" as defined in LeCun's 2022 paper "A Path Towards Autonomous Machine Intelligence". In this article, inference will often involve various strategies for improving language model inference skills, such as step-by-step inference using few examples. It’s not entirely clear whether the language model is actually reasoning, or simply generating a larger context that increases the likelihood of correctly predicting the missing token. It may be helpful to refer to the discussion on this topic by other researchers (Huang and Chang (2022)): Although reasoning may be an abuse of language based on the current SOTA results, the term is already used in the community. A more practical definition of contextual reasoning in augmented language models is giving the model more computational steps before generating an answer to a prompt.
tool. #For augmented language models, a tool is an external module, typically called using rules or special tokens, whose output is included in the context of the augmented language model. The tool can collect external information or have an impact on the virtual or physical world (often perceived by an augmented language model). An example of a tool that obtains external information is a document retriever, while a tool that has external effects is a robotic arm. Tools can be called during training or inference time. In general, learning to interact with a tool may include learning to call its API.
Behavior. For an augmented language model, an action is invoking a tool that has an impact on the virtual or physical world and observing the results, typically by including it in the current context of the augmented language model. For example, some of the works mentioned in this article discuss web search or the manipulation of robotic arms through language models. To overuse the terminology a bit, researchers sometimes refer to an augmented language model's invocation of a tool as a behavior, even if it has no external effects.
#Why should reasoning and tools be discussed together? The combination of reasoning and tools in language models is used to solve a large number of complex tasks without the need for heuristics and therefore has better generalization capabilities. Typically, inference will facilitate language models that decompose a given problem into potentially simpler subtasks, while tools will help get each step correct, such as getting results from mathematical operations. In other words, inference is a way for language models to combine different tools to solve complex tasks, and tools are a way to avoid inference failures using efficient decomposition. Both should benefit from the other. Furthermore, inference and tools can be placed under the same “hood” since both enhance the context of the language model to better predict missing tokens, albeit in different ways.
#Why should tools and actions be discussed together? Language models can be invoked in the same way as tools that gather additional information and have an impact on the virtual or physical world. For example, there seems to be no difference between a language model outputting Python code for solving a mathematical operation and a language model outputting Python code for operating a robotic arm. Some of the work discussed in the paper has used language models with implications for virtual or physical worlds. From this point of view, it can be said that language models have behavioral potential, and the important progress they have made as a direction for automated agents is also worth looking forward to.
This article divides the research included in the survey into three parts. Section 2 examines work on enhancing the inference capabilities of language models as defined above. Section 3 focuses on work that allows language models to interact with and take action on external tools. Finally, Section 4 explores whether reasoning and tool use is achieved through heuristics or through learning, for example through supervision or reinforcement. The survey also includes other components, which the authors discuss in Section V. For brevity, the survey focuses on work that combines inference or tools with language models. Finally, although the focus of this article is on large language models, not all studies considered employed large models, so to ensure accuracy, language models will also be adhered to in the remaining investigations.
Inference
Previous work has shown that large language models can solve simple inference problems but not complex inference problems: therefore, this paper Section focuses on various strategies to enhance the reasoning skills of language models. One of the challenges of complex inference problems for linear models is to correctly obtain the solution by combining their predicted correct answers into subproblems. For example, a language model can accurately predict the birth and death dates of famous people, but it may not accurately predict the age. Some researchers refer to this difference as the compositionality gap of language models. The remainder of this section discusses work related to three popular paradigms of induced inference in language models. Since the current work focuses on inference combined with tools, the reader is referred here to a more in-depth discussion of the work of other researchers on large language model inference.
Usage of Tools and Behaviors
# Recent language model research lines allow model access not necessarily stored in its weights knowledge, such as factual knowledge. More precisely, tasks such as precise computation or information retrieval can be offloaded to external modules, such as a Python interpreter or a search engine module that is queried by the model, in which case these modules make use of tools. Furthermore, when a tool has an impact on the external world, we can say that the language model performed an action. Easily include tools and behaviors in the form of special tokens, a convenient feature combined with Transformer language modeling.
After reviewing how language models can be enhanced to exercise their ability to reason and apply tools, this survey also describes how to teach models to apply these abilities.
For more research details, please refer to the original paper.
The above is the detailed content of After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information

This article describes how to adjust the logging level of the ApacheWeb server in the Debian system. By modifying the configuration file, you can control the verbose level of log information recorded by Apache. Method 1: Modify the main configuration file to locate the configuration file: The configuration file of Apache2.x is usually located in the /etc/apache2/ directory. The file name may be apache2.conf or httpd.conf, depending on your installation method. Edit configuration file: Open configuration file with root permissions using a text editor (such as nano): sudonano/etc/apache2/apache2.conf

In Debian systems, the readdir function is used to read directory contents, but the order in which it returns is not predefined. To sort files in a directory, you need to read all files first, and then sort them using the qsort function. The following code demonstrates how to sort directory files using readdir and qsort in Debian system: #include#include#include#include#include//Custom comparison function, used for qsortintcompare(constvoid*a,constvoid*b){returnstrcmp(*(

Configuring a Debian mail server's firewall is an important step in ensuring server security. The following are several commonly used firewall configuration methods, including the use of iptables and firewalld. Use iptables to configure firewall to install iptables (if not already installed): sudoapt-getupdatesudoapt-getinstalliptablesView current iptables rules: sudoiptables-L configuration

The readdir function in the Debian system is a system call used to read directory contents and is often used in C programming. This article will explain how to integrate readdir with other tools to enhance its functionality. Method 1: Combining C language program and pipeline First, write a C program to call the readdir function and output the result: #include#include#include#includeintmain(intargc,char*argv[]){DIR*dir;structdirent*entry;if(argc!=2){

In Debian systems, OpenSSL is an important library for encryption, decryption and certificate management. To prevent a man-in-the-middle attack (MITM), the following measures can be taken: Use HTTPS: Ensure that all network requests use the HTTPS protocol instead of HTTP. HTTPS uses TLS (Transport Layer Security Protocol) to encrypt communication data to ensure that the data is not stolen or tampered during transmission. Verify server certificate: Manually verify the server certificate on the client to ensure it is trustworthy. The server can be manually verified through the delegate method of URLSession

The steps to install an SSL certificate on the Debian mail server are as follows: 1. Install the OpenSSL toolkit First, make sure that the OpenSSL toolkit is already installed on your system. If not installed, you can use the following command to install: sudoapt-getupdatesudoapt-getinstallopenssl2. Generate private key and certificate request Next, use OpenSSL to generate a 2048-bit RSA private key and a certificate request (CSR): openss

Managing Hadoop logs on Debian, you can follow the following steps and best practices: Log Aggregation Enable log aggregation: Set yarn.log-aggregation-enable to true in the yarn-site.xml file to enable log aggregation. Configure log retention policy: Set yarn.log-aggregation.retain-seconds to define the retention time of the log, such as 172800 seconds (2 days). Specify log storage path: via yarn.n
