


Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.
Look carefully, the man in front of you is constantly giving natural language instructions to a robot, such as "Push the green star between the red blocks", "Move the blue block to the lower left Corner", the robot can complete every input command in real time.
Since the 1960s, robotics experts have been trying to make robots understand people's "natural language instructions" and perform specific actions.
Ideally, future robots will react in real time to any relevant task that users can describe in natural language.
Especially in open human environments, users may need to customize the behavior of the robot when it occurs, providing quick corrections, such as "stop, move the arm up a little" or specify Limit "Move slowly to the right".
In addition, real-time languages can make it easier for people and robots to collaborate on complex long-term tasks, and people can guide robots iteratively and interactively Operation, there will occasionally be verbal feedback.
The current related work can be roughly divided into the following three parts:
1. The robot body needs to exist in the real world;
2. Able to respond to a large number of rich natural language commands;
3. Able to execute interactive (interactive) language commands , that is, the robot needs to accept new natural language instructions during task execution.
Regarding the third point, the current interactive development speed in the field of robots is still very slow, which also makes robots lack a "sense of life".
Recently Google published a paper proposing a brand new framework that can produce real-world, real-time interactive robots that execute natural language instructions, as well as related data sets and environments. , benchmarks and strategies are all available.
##Paper link: https://arxiv.org/pdf/2210.06407.pdf
Project homepage: https://interactive-language.github.io/
Through a data set of hundreds of thousands of language annotation trajectories Conducting behavior cloning training, the resulting policy can skillfully execute an order of magnitude more commands than previous work achieved. In the real world, the researchers estimated that the method had a 93.5% success rate on 87,000 different natural language strings.
# And the same strategy can be guided by humans in real time via natural language to solve a wide range of precise long-distance rearrangement goals, such as "using Make a smiling face with building blocks" etc.
The data set released with the paper includes nearly 600,000 language-tagged trajectories, which is an order of magnitude larger than previously available data sets.
Interactive Language: Real-time Conversation with the RobotTo integrate the robot into the real world, the most important thing is to be able to process open natural language instructions, but from the machine From a learning perspective, getting robots to learn open vocabulary languages is a huge challenge.
Open representative models need to perform a large number of tasks, including small corrective instructions, etc. Existing multi-task learning setups leverage carefully designed imitation learning datasets or complex reinforcement learning reward functions to drive learning for each task, and predefined sets designed in this way are bound to not be very large.
Therefore, a key question in the open vocabulary task is: how to extend the collection of robot data to cover thousands of actions in real environments, and How do you connect all of this behavior to the natural language instructions that the end user might actually provide?
In interactive languages, the key to the large-scale simulation learning framework proposed by Google is the scalability of creating large, multi-language conditional robot demonstration data sets.
Unlike the previous setup where all skills were defined and then a curated demonstration of each skill was collected, the researchers continued to work across multiple robots without scene resets. ) or low level skill segmentation.
All data, including failed data (such as knocking blocks off a table), must go through a HindSight language relabeling process before being paired with text.
In this process, annotators need to watch long robot videos to identify as many behaviors as possible, mark the start and end time of each behavior, and use unlimited forms of Natural language to describe each fragment.
The most important thing is that compared to the previous set of bootstrapping, all skills used for training are revealed bottom-up from the data itself, rather than being pre-set by researchers. definite.
#The researchers intentionally made the learning method and architecture as simple as possible. The Robot Policy Network is a cross-attention Transformer that combines 5 Hz video and text. Mapping to 5 Hz robot motion, the target is cloned using standard supervised learning behavior without auxiliary losses.
While testing, new natural language commands can be sent into the policy network via speech-to-text at rates up to 5 Hz.
Open Source Benchmark
During the annotation process, the researchers collected a Language-Table dataset containing more than 440,000 actual and 180,000 simulated robot executions of natural Demonstration of language commands, and the sequence of actions taken by the robot during the demonstration.
This is also currently the largest language-conditioned robot demonstration data set, directly improved by an order of magnitude.
Language-Table has launched a simulation learning benchmark, which can be used for model selection or to evaluate the ability of robots trained by different methods to execute instructions.
Real-time language behavior learning
In experiments, researchers found that robots are particularly powerful when they can follow natural language instructions input in real time. .
On the project website, the researchers demonstrate that users can guide the robot through complex long-horizon sequences to solve long-term problems using only natural language. The goal of precise coordinated control.
For example, if there are many blcoks on the table, the command can be "Make a smiley face with green eyes" or "Place them all in a vertical line "Up" and so on.
Because the robot was trained to follow open-vocabulary language, experiments saw the robot respond to a range of different verbal corrections, such as "Gently to the right." Move the red star".
Finally, the researchers explored the advantages of real-time language, such as making robot data collection more efficient. A human operator can control four robots at the same time using spoken language. It is possible Scaling robot data collection in the future without having to equip each robot with an annotator.
Conclusion
Although the project is currently limited to a fixed set of objects on the desktop, the experimental results of the interactive language can initially show that large-scale imitation learning can indeed produce real-time interactive A bot capable of following free-form end-user commands.
In order to promote the advancement of real-time language control technology for physical robots, researchers have open sourced Language-Table, which is currently the largest real-world robot demonstration data set based on language conditions. It can also be used as Related simulation benchmarks.
The researchers believe that the role of this data set may not only be limited to the field of robot control, but may also be used to study language and action conditional video prediction, robot video conditional language modeling, or in It provides a new starting point for studying many other interesting and active problems in the broader machine learning context.
The above is the detailed content of Google robots achieve interactive language with an accuracy of 93.5%, and the amount of open source data increases tenfold.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



DeepSeek is a powerful information retrieval tool. Its advantage is that it can deeply mine information, but its disadvantages are that it is slow, the result presentation method is simple, and the database coverage is limited. It needs to be weighed according to specific needs.

DeepSeek is a proprietary search engine that only searches in a specific database or system, faster and more accurate. When using it, users are advised to read the document, try different search strategies, seek help and feedback on the user experience in order to make the most of their advantages.

This article introduces the registration process of the Sesame Open Exchange (Gate.io) web version and the Gate trading app in detail. Whether it is web registration or app registration, you need to visit the official website or app store to download the genuine app, then fill in the user name, password, email, mobile phone number and other information, and complete email or mobile phone verification.

Why can’t the Bybit exchange link be directly downloaded and installed? Bybit is a cryptocurrency exchange that provides trading services to users. The exchange's mobile apps cannot be downloaded directly through AppStore or GooglePlay for the following reasons: 1. App Store policy restricts Apple and Google from having strict requirements on the types of applications allowed in the app store. Cryptocurrency exchange applications often do not meet these requirements because they involve financial services and require specific regulations and security standards. 2. Laws and regulations Compliance In many countries, activities related to cryptocurrency transactions are regulated or restricted. To comply with these regulations, Bybit Application can only be used through official websites or other authorized channels

It is crucial to choose a formal channel to download the app and ensure the safety of your account.

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

A detailed introduction to the login operation of the Sesame Open Exchange web version, including login steps and password recovery process. It also provides solutions to common problems such as login failure, unable to open the page, and unable to receive verification codes to help you log in to the platform smoothly.

To access the latest version of Binance website login portal, just follow these simple steps. Go to the official website and click the "Login" button in the upper right corner. Select your existing login method. If you are a new user, please "Register". Enter your registered mobile number or email and password and complete authentication (such as mobile verification code or Google Authenticator). After successful verification, you can access the latest version of Binance official website login portal.
