Home > Technology peripherals > AI > body text

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

PHPz
Release: 2023-11-24 14:53:48
forward
763 people have browsed it

The power of large language models (LLM) is an undoubted fact, but they still sometimes make simple mistakes, showing their weak reasoning ability

For example, LLM may make incorrect judgments due to irrelevant context or preferences or opinions inherent in the input prompt. The latter situation presents a problem known as "sycophancy", where the model remains consistent with the input. Is there any way to alleviate this type of problem? Some scholars have tried to solve the problem by adding more supervised training data or reinforcement learning strategies, but these methods cannot fundamentally solve the problem

In a recent study, Meta researchers pointed out that there are fundamental problems with the way the Transformer model itself is built, especially its attention mechanism. In other words, soft attention tends to assign probabilities to most of the context (including irrelevant parts) and overly focuses on repeated tokens

Therefore, the researchers proposed a A completely different approach to attention, which performs attention by using LLM as a natural language reasoner. Specifically, they leveraged LLM's ability to follow instructions that prompt them to generate the context they should focus on, so that they only include relevant material that doesn't distort their own reasoning. The researchers call this process System 2 Attention (S2A), and they view the underlying transformer and its attention mechanism as an automatic operation similar to human System 1 reasoning

#When people need When there is special focus on a task and System 1 is likely to make an error, System 2 allocates strenuous mental activity and takes over the human work. Therefore, this subsystem has similar goals to the S2A proposed by the researchers, which hopes to alleviate the above-mentioned failure of the transformer's soft attention through additional inference engine work

The content that needs to be rewritten is: Paper link: https://arxiv.org/pdf/2311.11829.pdfNew title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

The researcher’s classification and motivation of the S2A mechanism And several specific implementations are described in detail. During the experimental phase, they confirmed that S2A can produce LLM that is more objective and less subjectively biased or flattering than standard attention-based LLM

, especially when the question contains interfering opinions. On the revised TriviQA data set, compared with LLaMA-2-70B-chat, S2A improved the factuality from 62.8% to 80.3%; in the task of generating long-format parameters containing interfering input emotions, S2A's objectivity improved 57.4%, and is essentially unaffected by the insertion point of view. In addition, for mathematical word problems with irrelevant sentences in GSM-IC, S2A improved the accuracy from 51.7% to 61.3%.

This study was recommended by Yann LeCun.

System 2 Attention New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

Figure 1 below shows an example of pseudo-correlation. Even the most powerful LLM can change answers to simple fact questions when the context contains irrelevant sentences, because words appearing in the context inadvertently increase the probability of incorrect answers

Therefore, we need to study a more deeply understood and more thoughtful attention mechanism. In order to distinguish it from the lower-level attention mechanism, the researchers proposed a system called S2A. They explored a way to leverage LLM itself to build this attention mechanism, specifically adjusting the LLM by removing irrelevant text to rewrite the context's instructions. approach, LLM is able to reason carefully and make decisions about relevant parts of the input before generating a response. Another advantage of using command-adjusted LLM is that it can control the focus of attention, which is somewhat similar to the way humans control their own attention.New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

S2A includes two steps:

  • Given context x, S2A first regenerates context x', thereby removing irrelevant parts of the context that would adversely affect the output. This article expresses it as x ′ ∼ S2A (x).
  • Given x ′ , the regenerated context is then used instead of the original context to generate the final response of the LLM: y ∼ LLM (x ′ ).

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

Alternative implementations and variations

In this article , we studied several different versions of the S2A approach

without context and problem separation. In the implementation of Figure 2, we choose to regenerate the context decomposed into two parts (context and question). Figure 12 shows a variation of this prompt.

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

Keep the original context in S2A, after regenerating the context it should contain all the necessary elements that should be noted and then the model will only context, the original context is discarded. Figure 14 shows a variation of this prompt.

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

Imperative prompts. The S2A prompt given in Figure 2 encourages removing opinionated text from context and requires a response that is not opinionated using the instructions in step 2 (Figure 13).

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

Implementations of S2A all emphasize regenerating context to increase objectivity and reduce sycophancy. However, the article argues that there are other points that need to be emphasized, for example, we can emphasize relevance versus irrelevance. The prompt variant in Figure 15 gives an example

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

Experiment

This article was conducted Experiments in three settings: fact-based quizzes, long argument generation, and solving math word problems. Additionally, this paper evaluates in two settings using LLaMA-2-70B-chat as the base model

  • Baseline: The input prompts provided in the dataset are fed to model and answer in a zero-sample manner. Model generation can be affected by spurious correlations provided in the input.
  • Oracle Prompt: Prompts without additional comments or irrelevant sentences are fed into the model and answered in a zero-shot manner.

Figure 5 (left) shows the evaluation results on fact question answering. System 2 Attention is a vast improvement over the original input prompt, achieving 80.3% accuracy—close to Oracle Prompt performance.

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

Overall results show that Baseline, Oracle Prompt, and System 2 Attention are all evaluated as being able to provide similarly high-quality evaluations. Figure 6 (right) shows the sub-results:

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

In the GSM-IC task, Figure 7 shows the results of different methods. Consistent with the results of Shi et al., we find that the baseline accuracy is much lower than oracle. This effect is even greater when the unrelated sentences belong to the same topic as the question, as shown in Figure 7 (right)

New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities

Learn more For more information, please refer to the original paper.

The above is the detailed content of New title: Meta improves the Transformer architecture: a new attention mechanism that enhances reasoning capabilities. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template