On December 27, A
#, MetaAI’s A
As of the evening of the 27th, this tweet The reading volume has reached 73.9k.
He said that given only 5 demonstrations, MoDem can solve problems with sparse rewards and high-dimensional action spaces in 100K interaction steps. Significantly outperforms existing state-of-the-art methods on challenging visual motion control tasks. How excellent is it? They found that MoDem achieved a 150%-250% higher success rate in completing sparse reward tasks than previous methods in low-data regimes
#.
Lecun also forwarded this research, saying that MoDem’s model architecture is similar to JEPA and can make predictions in the representation space without the need for a decoder.
The editor has put the link below, if you are interested, you can take a look~
Paper link: https://arxiv.org/abs/2212.05698
Github link: https: //github.com/facebookresearch/modem
Research Innovation and Model Architecture
The low sample efficiency is the practical application of deploying deep reinforcement learning (RL) algorithms The main challenge, especially visuomotor control.Model-based RL has the potential to achieve high sample efficiency by simultaneously learning a world model and using synthetic deployment for planning and policy improvements.
However, in practice, the efficient learning of samples in model-based RL is bottlenecked by exploration challenges. This research precisely solves these main challenges.
This The model architecture is similar to Yann LeCun's JEPA and does not require a decoder.
The author Aravind Rajeswaran said that compared with Dreamer, which requires a decoder for pixel-level prediction and has a heavy architecture, the decoder-less architecture can support direct insertion of visual representations pre-trained using SSL.
############In addition, based on IL RL, they proposed a three-stage algorithm: ########The results show that the generated algorithm performs well in 21 SOTA results (State-Of-The-Art result) were achieved in hard visual motion control tasks, including Adroit dexterous operation, MetaWorld and DeepMind control suites.
From the data point of view, MoDem performs far better than other models in various tasks, and the results are 150% to 250% higher than the previous SOTA method.
The red line shows MoDem’s performance in various tasks
In the process, they also shed light on the importance of different stages in MoDem, the importance of data augmentation for visual MBRL, and the utility of pre-trained visual representations.
Finally, using frozen R3M functionality is far superior to the direct E2E approach. This is exciting and shows that visual pretraining from video can support world models.
But E2E with strong data in August competes with frozen R3M, we can do better through pre-training.
The above is the detailed content of Meta launches MoDem world model: solving three major challenges in the visual field, forwarded by LeCun. For more information, please follow other related articles on the PHP Chinese website!