Home Technology peripherals AI Directly expands to infinite length, Google Infini-Transformer ends the context length debate

Directly expands to infinite length, Google Infini-Transformer ends the context length debate

Apr 13, 2024 am 08:00 AM
Google industry Memory usage

I wonder if Gemini 1.5 Pro uses this technology.

Google has made another big move and released the next generation Transformer model Infini-Transformer.

Infini-Transformer introduces an efficient way to scale Transformer-based large language models (LLMs) to infinitely long inputs without increasing memory and computational requirements. Using this technology, the researchers successfully increased the context length of a 1B model to 1 million; applied to the 8B model, the model can handle the 500K book summary task.

The Transformer architecture has dominated the field of generative artificial intelligence since the publication of the groundbreaking research paper "Attention is All You Need" in 2017. Google's optimized design of Transformer has been relatively frequent recently. A few days ago, they updated the Transformer architecture and released Mixture-of-Depths (MoD), which changed the previous Transformer computing model. Within a few days, Google released this new study.

Researchers who focus on the field of AI understand the importance of memory. It is the cornerstone of intelligence and can provide efficient computing for LLM. However, Transformer and Transformer-based LLM exhibit quadratic complexity in both memory usage and computation time due to the inherent characteristics of the attention mechanism, i.e., the attention mechanism in Transformer. For example, for a 500B model with a batch size of 512 and a context length of 2048, the memory footprint of the attention key-value (KV) state is 3TB. But in fact, the standard Transformer architecture sometimes needs to extend the LLM to longer sequences (such as 1 million tokens), which brings huge memory overhead, and as the context length increases, the deployment cost also increases.

Based on this, Google has introduced an effective approach, the key component of which is a new attention technology called Infini-attention. Unlike traditional Transformers, which use local attention to discard old fragments and free up memory space for new fragments. Infini-attention adds compressive memory, which can store used old fragments in compressed memory. When output, the current context information and the information in the compressed memory will be aggregated, so the model can retrieve the complete context history.

This method enables Transformer LLM to scale to infinitely long contexts with limited memory and process extremely long inputs for calculations in a streaming manner.

Experiments show that the method outperforms the baseline on long-context language modeling benchmarks while reducing memory parameters by more than 100 times. The model achieves better perplexity when trained with 100K sequence length. In addition, the study found that the 1B model was fine-tuned on key instances of 5K sequence length, solving the 1M length problem. Finally, the paper shows that the 8B model with Infini-attention achieved new SOTA results on the 500K length book summary task after continuous pre-training and task fine-tuning.

The contributions of this article are summarized as follows:

  • Introduces a practical and powerful attention Force mechanism Infini-attention - with long-term compressed memory and local causal attention, can be used to effectively model long-term and short-term context dependencies;
  • Infini-attention has a standard scaling dot product Attention (standard scaled dot-product attention) is minimally changed and is designed to support plug-and-play continuous pre-training and long-context adaptation;
  • This approach enables Transformer LLM is capable of processing extremely long inputs in a streaming manner, scaling to infinitely long contexts with limited memory and computing resources.
Directly expands to infinite length, Google Infini-Transformer ends the context length debate
  • ## Paper link: https://arxiv.org/pdf/2404.07143.pdf
  • Paper title: Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Method introduction

Infini-attention enables Transformer LLM to efficiently handle infinitely long inputs with limited memory footprint and computation. As shown in Figure 1 below, Infini-attention incorporates compressed memory into the ordinary attention mechanism, and builds masked local attention and long-term linear attention mechanisms in a single Transformer block.
Directly expands to infinite length, Google Infini-Transformer ends the context length debate
This subtle but critical modification to the Transformer attention layer can extend the context window of existing LLMs to infinite lengths through continuous pre-training and fine-tuning.

Infini-attention takes all keys, values, and query states of standard attention calculations for long-term memory consolidation and retrieval, and transfers the attention's old KV states are stored in compressed memory instead of discarding them like standard attention mechanisms.When processing subsequent sequences, Infini-attention uses the attention query state to retrieve values ​​from memory. To compute the final context output, Infini-attention aggregates long-term memory retrieval values ​​and local attention context.

As shown in Figure 2 below, the research team compared Infini-Transformer and Transformer-XL based on Infini-attention. Similar to Transformer-XL, Infini-Transformer operates on a sequence of segments and computes the standard causal dot product attention context in each segment. Therefore, the dot product attention computation is local in some sense.
Directly expands to infinite length, Google Infini-Transformer ends the context length debate
However, local attention discards the attention state of the previous segment when processing the next segment, but Infini-Transformer reuses the old KV attention state to Maintain the entire context history via compressed storage. Therefore, each attention layer of Infini-Transformer has a global compressed state and a local fine-grained state.

Similar to multi-head attention (MHA), in addition to dot product attention, Infini-attention also maintains H parallel compressed memories (H is the number of attention heads).
Directly expands to infinite length, Google Infini-Transformer ends the context length debate
Table 1 below lists the context memory footprint and effective context length defined by several models based on model parameters and input segment length. Infini-Transformer supports infinite context windows with limited memory footprint.
Directly expands to infinite length, Google Infini-Transformer ends the context length debate
Experiment

This research is based on long context language modeling with a length of 1M. The Infini-Transformer model is evaluated on key context block retrieval and 500K length book summarization tasks, which have extremely long input sequences. For language modeling, the researchers chose to train the model from scratch, while for the key and book summary tasks, the researchers used continuous pre-training of LLM to prove Infini-attention's plug-and-play long-context adaptability.

Long context language modeling. Table 2 results show that Infini-Transformer outperforms Transformer-XL and Memorizing Transformers baselines and stores 114x fewer parameters compared to the Memorizing Transformer model.
Directly expands to infinite length, Google Infini-Transformer ends the context length debate
Key tasks. Table 3 shows the Infini-Transformer fine-tuned on a 5K length input solving the key task up to 1M context length. The input tokens in the experiment ranged from 32K to 1M. For each test subset, the researchers controlled the position of the key so that it was located near the beginning, middle, or end of the input sequence. Experiments report zero-shot accuracy and fine-tuning accuracy. After 400 steps of fine-tuning on a 5K length input, Infini-Transformer solves tasks up to 1M context length.
Directly expands to infinite length, Google Infini-Transformer ends the context length debate
Summary tasks. Table 4 compares Infini-Transformer with an encoder-decoder model built specifically for the summarization task. The results show that Infini-Transformer surpasses the previous best results and achieves new SOTA on BookSum by processing the entire text of the book.
Directly expands to infinite length, Google Infini-Transformer ends the context length debate
#The researchers also plotted the overall Rouge score for the BookSum data validation split in Figure 4. The polyline trend shows that Infini-Transformers improve summary performance metrics as the input length increases.

Directly expands to infinite length, Google Infini-Transformer ends the context length debate

The above is the detailed content of Directly expands to infinite length, Google Infini-Transformer ends the context length debate. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Mar 18, 2025 pm 12:15 PM

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Mar 04, 2025 pm 11:48 PM

A detailed introduction to the login operation of the Sesame Open Exchange web version, including login steps and password recovery process. It also provides solutions to common problems such as login failure, unable to open the page, and unable to receive verification codes to help you log in to the platform smoothly.

How to register and download the latest app on Bitget official website How to register and download the latest app on Bitget official website Mar 05, 2025 am 07:54 AM

This guide provides detailed download and installation steps for the official Bitget Exchange app, suitable for Android and iOS systems. The guide integrates information from multiple authoritative sources, including the official website, the App Store, and Google Play, and emphasizes considerations during download and account management. Users can download the app from official channels, including app store, official website APK download and official website jump, and complete registration, identity verification and security settings. In addition, the guide covers frequently asked questions and considerations, such as

Tutorial on how to register, use and cancel Ouyi okex account Tutorial on how to register, use and cancel Ouyi okex account Mar 31, 2025 pm 04:21 PM

This article introduces in detail the registration, use and cancellation procedures of Ouyi OKEx account. To register, you need to download the APP, enter your mobile phone number or email address to register, and complete real-name authentication. The usage covers the operation steps such as login, recharge and withdrawal, transaction and security settings. To cancel an account, you need to contact Ouyi OKEx customer service, provide necessary information and wait for processing, and finally obtain the account cancellation confirmation. Through this article, users can easily master the complete life cycle management of Ouyi OKEx account and conduct digital asset transactions safely and conveniently.

CS-Week 3 CS-Week 3 Apr 04, 2025 am 06:06 AM

Algorithms are the set of instructions to solve problems, and their execution speed and memory usage vary. In programming, many algorithms are based on data search and sorting. This article will introduce several data retrieval and sorting algorithms. Linear search assumes that there is an array [20,500,10,5,100,1,50] and needs to find the number 50. The linear search algorithm checks each element in the array one by one until the target value is found or the complete array is traversed. The algorithm flowchart is as follows: The pseudo-code for linear search is as follows: Check each element: If the target value is found: Return true Return false C language implementation: #include#includeintmain(void){i

Ouyi okx official version download APP entrance Ouyi okx official version download APP entrance Mar 04, 2025 pm 11:24 PM

This article provides the latest download information about the official version of Ouyi OKX. This article will guide readers on how to securely and conveniently access the exchange's Android and iOS apps. This article contains step-by-step instructions and important tips designed to help readers easily download and install the Ouyi OKX app.

How to optimize jieba word segmentation to improve the keyword extraction effect of scenic spot comments? How to optimize jieba word segmentation to improve the keyword extraction effect of scenic spot comments? Apr 01, 2025 pm 06:24 PM

How to optimize jieba word segmentation to improve keyword extraction of scenic spot comments? When using jieba word segmentation to process scenic spot comment data, if the word segmentation results are ignored...

What are the free market software and apps abroad? What are the free market software and apps abroad? Mar 04, 2025 pm 07:57 PM

This article introduces several commonly used free financial market software and APPs abroad, so that investors can understand global market trends in a timely manner and reduce investment costs. The article covers stock tracking (Yahoo Finance, Google Finance, Investing.com, MarketWatch, TradingView), Forex trading (MetaTrader 4/5, Forex Factory), and cryptocurrency markets (CoinMarketCap, CoinGecko). Although these free tools are convenient and practical, you need to be aware of the possible delay in data and be cautious about advertising and privacy issues. You also need to understand free

See all articles