Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world-AI-php.cn

Home

Technology peripherals

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 06, 2024 am 12:18 AM

Google industry world model Genie

Generate a playable game world with one click.

It’s only been two weeks since it came out, and Google’s world model is also here, and its capabilities seem even more powerful: the virtual world it generates is “autonomous and controllable.” Just now, Google defined a new paradigm of generative AI - Generative Interactive Environments (Genie). Genie is an 11 billion parameter base world model that can generate playable, interactive environments from a single image prompt.

We can prompt it with images it has never seen before, and then interact with the virtual world of our imagination.

Whether it’s composite images, photos or even hand-drawn sketches, Genie can generate endless playable worlds from them.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Genie consists of three parts: a latent action model to infer potential actions between each pair of frames; a video tokenizer to convert raw video frames into discrete tokens; and a dynamic model to Predict the next frame of a video given a potential action and a past frame token.

Seeing the release of this technology, many people said: Google is coming to lead AI technology again.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Google also proposes that the potential actions learned by Genie can be transferred to real human-designed environments. Based on this hypothesis, Google trained a Genie model on robot videos as a proof-of-concept for potential world model applications in the field of robotics.

Disrupted gaming, design, XR, robotics industries...

We can understand the revolutionary significance of Genie from four dimensions.

First, Genie can learn controls without action tags.

Specifically, Genie is trained with a large number of public Internet video data sets without any action label data.

This would have been a challenge because Internet videos often don’t have labels about which action is being performed and which part of the image should be controlled, but Genie is able to learn fine-grained control specifically from Internet videos.

For Genie, it not only understands which parts of observations are generally controllable, but also infers various potential actions that are consistent in the generated environment. Note how the same underlying action can produce similar behavior in different prompt images.

Secondly, Genie can cultivate the next generation of "creators".

Creating a completely new interactive environment with just a single image opens the door to a variety of new ways of generating and entering virtual worlds. For example, we can use a state-of-the-art text generation image model to generate the starting frame, and then work with Genie to generate a dynamic interactive environment.

In the following animation, Google used Imagen2 to generate images, and then used Genie to turn them into reality:

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Genie can do more than that, it can also be applied to human design-related creative fields such as sketching .

Or, applied to real-world images:

Once again, Google believes that Genie is the cornerstone of realizing general-purpose intelligence. Previous research has shown that gaming environments can be effective testbeds for developing AI agents, but are often limited by the number of games available.

Now with Genie, future AI agents can be trained in the endless curriculum of the newly generated world. Google presents a proof of concept that the potential actions learned by Genie can be transferred to real human-designed environments.

Finally, Google stated that Genie is a general method that can be applied to multiple domains without requiring any additional domain knowledge.

Although the data used is more 2D Platformer game play and robot videos, the method is general and applicable to any type of domain and can be extended to larger Internet data sets.

Google trained a smaller 2.5B model on RT1’s motion-free videos. As is the case with Platformers, trajectories with the same underlying action sequence will often exhibit similar behavior.

This shows that Genie can learn a consistent action space, which may be suitable for training robots to create generalized embodied intelligence.

Technology Revealed: The paper "Genie: Generative Interactive Environments" has been released

Google DeepMind has released the Genie paper.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Paper address: https://arxiv.org/pdf/2402.15391.pdf
Project homepage: https://sites.google.com/view/genie-2024/home?pli= 1

There are as many as 6 co-authors of this paper, including Chinese scholar Yuge (Jimmy) Shi. She is currently a research scientist at Google DeepMind and received her PhD in machine learning from the University of Oxford in 2023.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Method Introduction

Multiple components in the Genie architecture are built based on Vision Transformer (ViT). It is worth noting that due to the secondary memory cost of Transformer, which brings challenges to the video field, a video can contain up to ?(10^4) tokens. Therefore, Google uses a memory-efficient ST-transformer architecture (see Figure 4) in all model components to balance model capacity and computational constraints.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Genie contains three key components (as shown in the figure below):

1) Latent Action Model (LAM), used to reason about potential actions between each pair of frames;

2) Video tokenizer (Tokenizer), used to convert original video frames into discrete tokens?;

3) Dynamic model, given potential actions and tokens of past frames, used to predict the next frame of the video.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Specifically:

Latent action model: In order to achieve controllable video generation, Google uses the action taken in the previous frame as a condition for future frame prediction. However, such action labels are rarely available in videos on the Internet, and the cost of obtaining action annotations can be high. Instead, Google learns potential actions in a completely unsupervised manner (see Figure 5).

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Video tokenizer: Based on previous research, Google compresses videos into discrete tokens to reduce dimensionality and achieve higher quality video generation (see Figure 6). For implementation, Google uses VQ-VAE, which takes ? frames Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world of a video as input and generates a discrete representation for each frame: , where ? is the discrete latent space size. The tokenizer is trained on the entire video sequence using standard VQ-VQAE.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Dynamic model: is a decoder-only MaskGIT transformer (Figure 7).

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Genie’s inference process is as follows

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Experimental results

Extension results

In order to study the expansion behavior of the model, Google conducted experiments on models with parameter sizes ranging from 2.7B to 41M To explore the impact of model size and batch size, the experimental results are shown in Figure 9 below.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

It can be observed that as the model size increases, the final training loss will decrease. This is a strong indication that the Genie approach benefits from scaling. At the same time, increasing the batch size will also bring gains to model performance.

Qualitative results

Google presents qualitative experimental results for the Genie 11B parametric model trained on the Platformers dataset and a smaller model trained on the Robotics dataset. The results show that the Genie model can generate high-quality, controllable videos across different domains. Notably, Google only uses out-of-distribution (OOD) image prompts to qualitatively evaluate its platform training models, demonstrating the robustness of the Genie approach and the value of large-scale data training.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Agent training. Perhaps one day, Genie can be used as a base world model for training multi-task agents. In Figure 14, the authors show that the model can already be used to generate different trajectories in a novel RL environment given a starting frame.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

The authors conduct evaluations in CoinRun, a procedurally generated 2D platform game environment, and compare with an oracle behavioral clone (BC) model with access to expert operations as an upper limit.

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Ablation research. Selection When designing the latent action model, the authors carefully considered the types of inputs to be used. While the final choice was to use raw images (pixels), the authors evaluated this choice against the alternative of using tokenized images (replacing x with z in Figure 5) when designing Genie. This alternative is called the “token input” model (see Table 2).

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

Tokenizer architecture ablation. The authors compared the performance of three tokenizer choices, including 1) (spatial only) ViT, 2) (spatial and temporal) ST-ViViT, and 3) (spatial and temporal) CViViT (Table 3).

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world

The above is the detailed content of Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7534

CakePHP Tutorial

1379

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

How to comment deepseek Feb 19, 2025 pm 05:42 PM

DeepSeek is a powerful information retrieval tool. Its advantage is that it can deeply mine information, but its disadvantages are that it is slow, the result presentation method is simple, and the database coverage is limited. It needs to be weighed according to specific needs.

How to search deepseek Feb 19, 2025 pm 05:39 PM

DeepSeek is a proprietary search engine that only searches in a specific database or system, faster and more accurate. When using it, users are advised to read the document, try different search strategies, seek help and feedback on the user experience in order to make the most of their advantages.

Sesame Open Door Exchange Web Page Registration Link Gate Trading App Registration Website Latest Feb 28, 2025 am 11:06 AM

This article introduces the registration process of the Sesame Open Exchange (Gate.io) web version and the Gate trading app in detail. Whether it is web registration or app registration, you need to visit the official website or app store to download the genuine app, then fill in the user name, password, email, mobile phone number and other information, and complete email or mobile phone verification.

Why can't the Bybit exchange link be directly downloaded and installed? Feb 21, 2025 pm 10:57 PM

Why can’t the Bybit exchange link be directly downloaded and installed? Bybit is a cryptocurrency exchange that provides trading services to users. The exchange's mobile apps cannot be downloaded directly through AppStore or GooglePlay for the following reasons: 1. App Store policy restricts Apple and Google from having strict requirements on the types of applications allowed in the app store. Cryptocurrency exchange applications often do not meet these requirements because they involve financial services and require specific regulations and security standards. 2. Laws and regulations Compliance In many countries, activities related to cryptocurrency transactions are regulated or restricted. To comply with these regulations, Bybit Application can only be used through official websites or other authorized channels

Sesame Open Door Trading Platform Download Mobile Version Gateio Trading Platform Download Address Feb 28, 2025 am 10:51 AM

It is crucial to choose a formal channel to download the app and ensure the safety of your account.

Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Mar 04, 2025 pm 11:48 PM

A detailed introduction to the login operation of the Sesame Open Exchange web version, including login steps and password recovery process. It also provides solutions to common problems such as login failure, unable to open the page, and unable to receive verification codes to help you log in to the platform smoothly.

Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Mar 18, 2025 pm 12:15 PM

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

Binance binance official website latest version login portal Feb 21, 2025 pm 05:42 PM

To access the latest version of Binance website login portal, just follow these simple steps. Go to the official website and click the "Login" button in the upper right corner. Select your existing login method. If you are a new user, please "Register". Enter your registered mobile number or email and password and complete authentication (such as mobile verification code or Google Authenticator). After successful verification, you can access the latest version of Binance official website login portal.

See all articles