Home > Technology peripherals > AI > body text

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

PHPz
Release: 2023-10-09 15:01:20
forward
675 people have browsed it

Do you think this is an ordinary boring self-driving video?

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

The original meaning of this content does not need to be changed, it needs to be rewritten into Chinese

Not a single frame is "real".

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

Different road conditions, various weather conditions, and more than 20 situations can be simulated, and the effect is just like the real thing.

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

The world model once again demonstrates its powerful role! This time, LeCun excitedly retweeted

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

after seeing it. The above effect is brought by the latest version of GAIA-1.

It has a scale of 9 billion parameters, and uses 4700 hours of driving video training to achieve the effect of inputting video, text or operations to generate automatic driving videos.

The most direct benefit is that it can better predict future events. It can simulate more than 20 scenarios, thereby further improving the safety of autonomous driving and reducing costs

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

The creative team said that this will change the rules of the autonomous driving game!

How is GAIA-1 implemented? In fact, we have previously introduced in detail the GAIA-1 developed by the Wayve team in Autonomous Driving Daily: a generative world model for autonomous driving. If you are interested in this, you can go to our official account to read relevant content!

The bigger the scale, the better the effect

GAIA-1 is a multi-modal generative world model that can understand and generate the world by integrating multiple perception methods such as vision, hearing and language. expression. This model uses deep learning algorithms to learn and reason about the structure and laws of the world from a large amount of data. The goal of GAIA-1 is to simulate human perception and cognitive abilities to better understand and interact with the world. It has wide applications in many fields, including autonomous driving, robotics, and virtual reality. Through continuous training and optimization, GAIA-1 will continue to evolve and improve, becoming a more intelligent and comprehensive world model

It uses video, text and motion as input, and generates realistic driving scene videos, while The behavior and scene characteristics of self-driving vehicles can be finely controlled

and videos can be generated with text prompts only .

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

The principle of its model is similar to that of large language models, that is, predicting the next token

The model can use vector quantization representation to discretize video frames, and then Predicting future scenarios is converted into predicting the next token in the sequence. The diffusion model is then used to generate high-quality videos from the language space of the world model.

The specific steps are as follows:

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

The first step is simple to understand, which is to recode and arrange and combine various inputs.

Different inputs can be projected into a shared representation by using specialized encoders to encode various inputs. Text and video encoders separate and embed the input, while the operational representations are individually projected into a shared representation

These encoded representations are temporally consistent

After permutation, key parts World Model appears.

As an autoregressive Transformer, it has the ability to predict the next set of image tokens in the sequence. It not only considers previous image tokens, but also considers contextual information of text and actions

The content generated by the model maintains consistency not only with the image, but also with the predicted text and actions

According to the team, the size of the world model in GAIA-1 is 6.5 billion parameters, which was trained on 64 A100s for 15 days.

By using a video decoder and a video diffusion model, these tokens are finally converted back to video

This step is about the semantic quality, image accuracy and temporal consistency of the video.

GAIA-1's video decoder has a scale of 2.6 billion parameters and was trained for 15 days using 32 A100s.

It is worth mentioning that GAIA-1 is not only similar in principle to the large language model, but also shows the characteristics of improving the generation quality as the model scale expands.

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

The team compared the early version released in June with the latest effect

The latter is 480 times larger than the former.

You can intuitively see that the video has been significantly improved in details, resolution, etc.

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

In terms of practical applications, GAIA-1 has also had an impact. Its creative team said that this will change the rules of autonomous driving.

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

The reasons come from three aspects:

  • Safety
  • Comprehensive training data
  • Long tail scenario

First of all, in terms of safety, the world model can simulate the future and give AI the ability to be aware of its own decisions, which is critical to the safety of autonomous driving.

Secondly, training data is also very critical for autonomous driving. The data generated is more secure, cheaper, and infinitely scalable.

Generative AI can solve a major challenge facing autonomous driving - long-tail scenarios. It can handle more edge cases, such as encountering pedestrians crossing the road in foggy weather. This will further improve the performance of autonomous driving

Who is Wayve?

GAIA-1 comes from British autonomous driving startup Wayve.

Wayve was founded in 2017, with investors including Microsoft, etc., and its valuation has reached Unicorn.

The founders are current CEOs Alex Kendall and Amar Shah (the leadership page of the company’s official website no longer has information about them). Both of them graduated from Cambridge University and have a doctorate in machine learning

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

On the technical roadmap, like Tesla, Wayve advocates a purely visual solution using cameras, abandoning high-precision maps very early and firmly following the "instant perception" route.

Not long ago, another large model LINGO-1 released by the team also caused a sensation.

This self-driving model can generate explanations in real time during driving, thereby further improving the interpretability of the model

In March this year, Bill Gates also took a test ride on Wayve’s Self-driving cars.

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

##Paper address: https://arxiv.org/abs/2309.17080

The world model shines! The realism of these 20+ autonomous driving scenario data is incredible...

The content that needs to be rewritten is: Original link: https://mp.weixin.qq.com/s/bwTDovx9-UArk5lx5pZPag

The above is the detailed content of The world model shines! The realism of these 20+ autonomous driving scenario data is incredible.... For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template