Transformers+world model, can it save deep reinforcement learning?-AI-php.cn

Table of Contents

What’s different about deep reinforcement learning

What’s so magical about Transformers

World Model and Transformers join forces, what do other people think

Home

Technology peripherals

Transformers+world model, can it save deep reinforcement learning?

王林

May 04, 2023 am 09:19 AM

world model Modeling iris

Many people know that AlphaGo, which defeated Li Sedol, Ke Jie and other top international chess players, has had three iterations, namely the first-generation AlphaGo Lee, which defeated Li Sedol, and the second-generation AlphaGo Master, which defeated Ke Jie. And the third generation AlphaGo Zero, which beats the previous two generations.

AlphaGo’s chess skills can increase from generation to generation. Behind this is actually an obvious trend in AI technology, which is the increasing proportion of reinforcement learning.

In recent years, reinforcement learning has undergone another "evolution". People call the "evolved" reinforcement learning deep reinforcement learning.

But the sample efficiency of deep reinforcement learning agents is low, which greatly limits their application in practical problems.

Recently, many model-based methods have been designed to solve this problem, and learning in the imagination of world models is one of the most prominent methods.

However, while nearly unlimited interaction with a simulated environment sounds appealing, the world model must remain accurate over long periods of time.

Inspired by the success of Transformer in sequence modeling tasks, Vincent Micheli, Eloy Alonso, and François Fleure of Cornell University introduced IRIS. This is a data-efficient agent that learns in a world model composed of discrete autoencoders and autoregressive Transformers.

On the Atari 100k benchmark, over the equivalent of just two hours of gameplay, IRIS achieved an average human-normalized score of 1.046 and outperformed humans in 10 out of 26 games.

Previously, LeCun once said that reinforcement learning will lead to a dead end.

Transformers+world model, can it save deep reinforcement learning?

Now it seems that Cornell University’s Vincent Micheli, Eloy Alonso, Francois Fleure and others are integrating world models and reinforcement learning (more precisely, deep reinforcement learning), and the bridge connecting the two is Transformers.

What’s different about deep reinforcement learning

When it comes to artificial intelligence technology, what many people can think of is deep learning.

In fact, although deep learning is still active in the field of AI, many problems have been exposed.

The most commonly used method of deep learning now is supervised learning. Supervised learning may be understood as "learning with reference answers". One of its characteristics is that the data must be labeled before it can be used for training. But now a large amount of data is unlabeled data, and the cost of labeling is very high.

So much so that in response to this situation, some people joked that "there is as much intelligence as there are artificial intelligence."

Many researchers, including many experts, are reflecting on whether deep learning is "wrong".

So, reinforcement learning began to rise.

Reinforcement learning is different from supervised learning and unsupervised learning. It uses an agent to continuously trial and error, and rewards and punishes the AI according to the trial and error results. This is DeepMind’s method for making various chess and card AI and game AI. Believers of this path believe that as long as the reward incentives are set correctly, reinforcement learning will eventually create a real AGI.

But reinforcement learning also has problems. In LeCun’s words, “reinforcement learning requires a huge amount of data to train the model to perform the simplest tasks.”

So reinforcement learning and deep learning were combined to become deep reinforcement learning.

Deep reinforcement learning, reinforcement learning is the skeleton, and deep learning is the soul. What does this mean? The main operating mechanism of deep reinforcement learning is actually basically the same as reinforcement learning, except that a deep neural network is used to complete this process.

What’s more, some deep reinforcement learning algorithms simply add a deep neural network to the existing reinforcement learning algorithm to implement a new set of deep reinforcement learning algorithms. The very famous deep reinforcement learning Algorithm DQN is a typical example.

What’s so magical about Transformers

Transformers first appeared in 2017 and were proposed in Google’s paper “Attention is All You Need”.

Before the emergence of Transformer, the progress of artificial intelligence in language tasks had lagged behind the development of other fields. “Natural language processing has been somewhat of a latecomer to this deep learning revolution that’s happened over the past decade,” says Anna Rumshisky, a computer scientist at the University of Massachusetts Lowell. “In a sense, NLP was Lagging behind computer vision, Transformer changes this."

In recent years, the Transformer machine learning model has become one of the main highlights of the advancement of deep learning and deep neural network technology. It is mainly used for advanced applications in natural language processing. Google is using it to enhance its search engine results.

Transformer quickly became a leader in applications such as word recognition focused on analyzing and predicting text. It sparked a wave of tools like OpenAI’s GPT-3 that can be trained on hundreds of billions of words and generate coherent new text.

Currently, the Transformer architecture continues to evolve and expand into many different variants, extending from language tasks to other domains. For example, Transformer has been used for time series prediction and is also the key innovation behind DeepMind’s protein structure prediction model AlphaFold.

Transformers have also recently entered the field of computer vision, and they are slowly replacing convolutional neural networks (CNN) in many complex tasks.

World Model and Transformers join forces, what do other people think

Regarding the research results of Cornell University, some foreign netizens commented: "Please note that these two Hours are the length of shots from the environment, and training on the GPU takes a week."

Some people also question: So this system learns on a particularly accurate potential world model? Does the model require no pre-training?

In addition, some people feel that the results of Vincent Micheli and others from Cornell University are not ground-breaking breakthroughs: "It seems that they just trained the world model, vqvae and actor critics, all of which are Replay buffer from those 2 hours of experience (and about 600 epochs)".

Reference: https://www.reddit.com/r/MachineLearning/comments/x4e4jx/r_transformers_are_sample_efficient_world_models/

The above is the detailed content of Transformers+world model, can it save deep reinforcement learning?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7488

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

How to model your own model in Kujiale - Steps in modeling your own model in Kujiale Mar 04, 2024 pm 07:55 PM

Many users who have just come into contact with the Kujiale software are not very familiar with how Kujiale models themselves? The following article brings you the steps of Kujiale's own modeling. Let's take a look. Enter the Kujiale platform. In Kujiale, click to enter the design and decoration interface. In the design interface, click on the industry library on the left, and click on the whole house hardware installation tools in the industry library. In the whole house hard decoration tool, modeling operations can be performed.

Transformers+world model, can it save deep reinforcement learning? May 04, 2023 am 09:19 AM

Many people know that AlphaGo, which defeated Li Sedol, Ke Jie and other top international chess players, had a total of three iterations. They were the first-generation AlphaGo Lee that defeated Li Sedol, the second-generation AlphaGo Master that defeated Ke Jie, and the second-generation AlphaGo Master that defeated the first two. The third generation of AlphaGo Zero. The reason why AlphaGo’s chess skills can increase from generation to generation is actually due to an obvious trend in AI technology, which is the increasing proportion of reinforcement learning. In recent years, reinforcement learning has undergone another "evolution". People call the "evolved" reinforcement learning deep reinforcement learning. However, the sample efficiency of deep reinforcement learning agents is low, which greatly limits their application in practical problems. recent

Just now, Google released a basic world model: 11B parameters, which can generate an interactive virtual world Aug 06, 2024 am 12:18 AM

Generate a playable game world with one click. It’s only been two weeks since it came out, and Google’s world model has also arrived, and its capabilities seem even more powerful: the virtual world it generates is “autonomous and controllable.” Just now, Google defined a new paradigm of generative AI - Generative Interactive Environments (Genie, GenerativeInteractiveEnvironments). Genie is an 11 billion parameter base world model that can generate playable interactive environments from a single image prompt. We can prompt it with images it has never seen before, and then interact with the virtual world of our own imagination. Whether it's composite images, photos or even hand-drawn sketches, Genie can generate endless playable worlds from them. Ge

Build web applications using Golang's web framework Iris Jun 25, 2023 pm 04:31 PM

With the popularity of the Internet, the demand for Web applications is getting higher and higher. In the past, we might have used languages such as PHP, Java or Python to build web applications, but with the continuous emergence of new technologies, we now choose to use Golang to build web applications. In Golang, Iris is a very excellent web framework. It has the same functions and ease of use as other mainstream web frameworks. In this article, we will explore the basics of building web applications using the Iris framework.

What are the official websites of go language modeling library? Aug 01, 2023 pm 04:40 PM

The official website of the go language modeling library includes: 1. GORM, a simple but powerful ORM library; 2. XORM, with high performance and ease of use; 3. beego ORM, which provides a simple API to handle database access and data mapping ; 4. sqlx, a lightweight database tool library; 5. gorp, providing a simple API to handle data persistence and query.

How to use deep modeling in Python? Jun 05, 2023 am 08:01 AM

With the rapid development of artificial intelligence and machine learning technology, deep learning has become one of the popular technologies in the field of artificial intelligence. As an easy-to-learn and easy-to-use programming language, Python has become the language of choice for many deep learning practitioners. This article will introduce you to how to use deep modeling in Python. 1. Install and configure the Python environment. First, we need to install Python and related deep learning libraries. Currently, the most commonly used deep learning libraries in Python are TensorFlow and PyT

Summary of commonly used functions in the Numpy library: a powerful tool for data analysis and modeling Jan 19, 2024 am 09:10 AM

Numpy is one of the most commonly used mathematics libraries in Python, integrating many of the best mathematical functions and operations. Numpy is widely used, including statistics, linear algebra, image processing, machine learning, neural networks and other fields. In terms of data analysis and modeling, Numpy is one of the indispensable tools. This article will share commonly used mathematical functions in Numpy, as well as sample codes for using these functions to implement data analysis and modeling. 1. Create an array. Use the array() function in Numpy to create a number.

L3 will be launched in the first half of next year at the latest: ideal end-to-end autonomous driving and greatly improved performance Aug 07, 2024 am 04:35 AM

Recently, with the rise of generative AI technology, many new car-making forces are exploring new methods of visual language models and world models. End-to-end intelligent driving new technologies seem to have become a common research direction. Last month, Li Auto released the third-generation autonomous driving technology architecture of end-to-end + VLM visual language model + world model. This architecture has been pushed to thousands of people for internal testing. It personifies intelligent driving behavior, improves the information processing efficiency of AI, and enhances the ability to understand and respond to complex road conditions. Li Xiang once said in public sharing that in the face of rare driving environments that are difficult for most algorithms to identify and process, VLM (Visual Language Model), a visual language model, can systematically improve the capabilities of autonomous driving. This method theoretically

See all articles