


Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM
Multi-task robot learning is of great significance in dealing with diverse and complex scenarios. However, current methods are limited by performance issues and difficulties in collecting training datasets.
This paper proposes GeRM (General Robot Model), where researchers use offline reinforcement learning to optimize data utilization strategies, learning from demonstrations and sub-optimal data, thereby surpassing human demonstrations limitations.
Authors: Song Wenxuan, Zhao Han, Ding Pengxiang, Cui Can, Lu Shangke, Fan Yaning, Wang Donglin
Unit: West Lake University, Zhejiang University
Paper address: https://arxiv.org/abs/2403.13358
Project address: https://songwxuan.github.io/GeRM/
Then a Transformer-based vision-language-action model is used to process multi-modal input and output actions.
By introducing an expert hybrid structure, GeRM achieves faster inference speed and higher overall model capacity, thus solving the problem of limited reinforcement learning parameters and improving multi-task performance. Model performance during learning while controlling computational cost.
Through a series of experiments, it is proven that GeRM outperforms other methods in all tasks, while verifying its efficiency in the training and inference processes.
In addition, the researchers also provided the QUARD-Auto data set to support training. The construction of this data set follows the new paradigm of data automation collection proposed in the article. This method can reduce the number of collection robots. The cost of data drives progress in the multi-task learning community.
Main contributions:
#1. Proposed a hybrid expert model for four-legged reinforcement learning for the first time. Train on mixed-quality data with the potential to learn optimal policies.
2. Compared with existing methods, GeRM shows a higher success rate when only activating 1/2 of its own parameters, activating the emergence ability, and at the same time during the training process A better data utilization strategy is demonstrated in .
3. Proposed a paradigm for fully automatic robot data set collection, and collected a large-scale open source data set.
Method
The GeRM network structure is shown in Figure 1. The visual-linguistic input including demonstration data and failure data is input to 8 after passing through the encoder and tokenizer respectively. The decoder uses a layer of mixed expert structure to generate action tokens, which are eventually converted into discrete robot action data and deployed to the robot through the underlying strategy. In addition, we use reinforcement learning for training.
Figure 1 GeRM network structure diagram
GeRM Decoder is an architecture model including Transformer Decoder, in which A feedforward network (FFN) was selected from a set of 8 different expert networks.
At each layer, for each token, the gating network selects two experts to process the token and combine their outputs in a weighted manner.
Different experts are good at different tasks/different action dimensions to solve problems in different scenarios, thereby learning a common model across multiple tasks. This architecture expands the amount of network parameters while keeping the computational cost essentially unchanged.
Figure 2 Decoder structure diagram
We propose an automatic paradigm to collect robot multi-mode status data. In this way, we constructed QUARD-Auto, a large-scale robotics dataset containing a combination of demonstration and suboptimal data. It includes 5 tasks and 99 subtasks, with a total of 257k trajectories. We will open source to promote the development of the robotics community.
Table 1 Introduction to the data set
Figure 3 Data Volume statistics
Experiments
#We conducted a comprehensive and robust series of experiments covering all 99 subtasks, each of which was carefully tested on 400 trajectories.
As shown in Table 1, GeRM has the highest success rate among all tasks. Compared with RT-1 and other variants of GeRM, it effectively learns from mixed-quality data, outperforms other methods, and exhibits superior capabilities in multiple tasks. At the same time, the MoE module balances computational cost and performance by activating some parameters during inference.
Table 2 Multi-task comparison experiment
GeRM shows commendable training efficiency. Compared with other methods, GeRM achieves extremely low loss and high success rate with only a few batches, highlighting GeRM's ability to optimize data utilization strategies.
Figure 4 Success rate/Loss change curve
GeRM demonstrates dynamic adaptive path planning emergent ability. As shown in the video, the quadruped robot has a limited field of view in the initial position, making it difficult to determine the direction of movement. To avoid the obstacle, it randomly chooses to turn left.
Subsequently, after encountering erroneous visual input, the robot performed a substantial reorientation to align with the correct target outside the original field of view. It then continues toward its destination, ultimately completing its mission.
It is worth noting that such trajectories do not belong to the distribution of our training data set. This demonstrates GeRM's emergent capabilities for dynamic adaptive path planning in the context of a scene, i.e., its ability to make decisions based on visual perception, plan future paths, and change next steps as needed.
Figure 5 Emergent Capability
The above is the detailed content of Sweep 99 sub-missions with MoE! Zhejiang University and others proposed a new general robot strategy GeRM. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to output a countdown in C? Answer: Use loop statements. Steps: 1. Define the variable n and store the countdown number to output; 2. Use the while loop to continuously print n until n is less than 1; 3. In the loop body, print out the value of n; 4. At the end of the loop, subtract n by 1 to output the next smaller reciprocal.

How to achieve the playback of pictures like videos? Many times, we need to implement similar video player functions, but the playback content is a sequence of images. direct...

Have you ever had difficulties transferring or storing large amounts of files? Learn about FileSplitter & Merger, an open source project designed to simplify this challenge in an elegant and efficient way. What is this project doing? File segmentation and merging are composed of two tools: File splitter—dividing large files into smaller chunks. File Merge - Why is it useful to recombine these blocks into the original file? Working with very large files can be daunting, especially when transferring or storing them. These tools allow you to split files into manageable fragments and then rebuild the original files from those fragments. File splitter This program splits the input file into smaller pieces of a specified size. Each block is saved as a separate file. Usage: Copy the text to be divided

Data update problems in zustand asynchronous operations. When using the zustand state management library, you often encounter the problem of data updates that cause asynchronous operations to be untimely. �...

A solution to implement text annotation nesting in Quill Editor. When using Quill Editor for text annotation, we often need to use the Quill Editor to...

Electron rendering process and WebView...

How to realize the function of playing pictures like videos? Many times, we need to achieve similar video playback effects in the application, but the playback content is not...

The return value type of the function is determined by the return type specified when the function is defined. Common types include int, float, char, and void (indicating that no value is returned). The return value type must be consistent with the actual returned value in the function body, otherwise it will cause compiler errors or unpredictable behavior. When returning a pointer, you must make sure that the pointer points to valid memory, otherwise it may cause a segfault. When dealing with return value types, error handling and resource release (such as dynamically allocated memory) need to be considered to write robust and reliable code.
