Have you ever wanted to tell a robot what to do with your own words, like you would a human?
For example, just tell your home assistant robot "Please heat up my lunch" and it will find the microwave on its own. Amazing, right?
Although language is the most intuitive way for humans to express their intentions, for a long time, people still relied heavily on handwritten code to implement robots. control. However, when ChatGPT appears, this situation will change.
In a recent study, a Microsoft team is exploring how to use OpenAI’s new AI language model ChatGPT to make natural human-computer interaction possible.
Paper link: https://www.microsoft.com/en-us/research/uploads/prod/ 2023/02/ChatGPT___Robotics.pdf
ChatGPT is a language model trained on a large corpus of text and human interactions, so it can generate coherent and grammatical responses to a wide range of prompts and questions. Correct response. The goal of this research is to see if ChatGPT can think beyond text and reason about the real world to help robots complete tasks. Researchers hope this will make it easier for people to interact with robots without having to learn complex programming languages or the details of robotic systems.
The key challenge of the research is to teach ChatGPT how to solve problems by taking into account the laws of physics, the operating environment, and the way the robot uses body movements to change its surroundings.
It turns out that ChatGPT can do a lot on its own, but it still needs some help. In the paper, the team describes a series of design principles that can be used to guide language models in solving robotic tasks, including (but not limited to) ad hoc prompt structures, high-level APIs, and feedback via human text. The researchers believe this work is just the beginning of a transformation in developing robotic systems and hope this study will inspire other researchers to join this interesting research area.
The current robot operation process is from Starting with an engineer or technical user, they are required to translate task requirements into system code. Engineers will be in the loop of the workflow, constantly writing new code and specifications to correct the robot's behavior. Overall, the process is slow (users need to write low-level code), expensive (requires highly skilled users with in-depth knowledge of robotics), and inefficient (requires multiple interactions to function properly).
But ChatGPT opens up a new bot paradigm and allows for potentially non-technical Users participate in the loop, monitoring the robot's performance while providing high-level feedback to the large language model (LLM). By following the researched design principles, ChatGPT can generate code for robotic scenarios. Without any fine-tuning, the study exploits the knowledge of LLM to control different robot shapes for various tasks. In their work, the researchers demonstrated multiple examples of ChatGPT solving robotic challenges, as well as complex robot deployments in the operational, aerial, and navigation domains.
Prompting LLM is a highly empirical science. The research established a set of methods and design principles for writing prompts for robot tasks through trial and error:
If the user is satisfied with the solution, the code can finally be deployed to the robot.
Let’s look at a few examples, you can find more case studies in the code repository.
Researchers let ChatGPT control real drones, and it proved to be the most effective among non-technical users and robots A very intuitive language-based interface. When user instructions are ambiguous, ChatGPT asks clarifying questions and writes complex code structures for the drone to visually inspect the architecture, such as zigzag patterns. It even learned to take selfies!
Researchers ChatGPT was also used in a simulated industrial inspection scenario using the Microsoft AirSim simulator. The model is able to effectively parse the user's high-level intentions and geometric cues to accurately control the drone.
##In the loop Users of: When complex tasks require dialogue Next, the researchers used ChatGPT for robotic arm operation scenarios and used conversational feedback to teach the model how to convert the initially provided API composition into more complex high-level functions: ChatGPT automated programming. The model is able to logically link learned skills together using curriculum-based strategies to perform actions such as stacking blocks. Additionally, this mockup shows a perfect example of connecting the textual and physical domains when building the Microsoft logo out of wooden blocks. Not only is it able to recall the logo from an internal knowledge base, it is also able to "draw" the logo (as SVG code) and then use the skills learned above to figure out which of the existing robot's movements can make up its appearance. Next, the researchers asked ChatGPT to write an algorithm that would allow the drone to reach Aim in the air without hitting obstacles. They told the model that the drone had a forward-facing range sensor, and ChatGPT immediately programmed most of the key building blocks for the algorithm. This task requires some conversation with a human, and ChatGPT's ability to make localized code improvements using only linguistic feedback is impressive. Perception-Action Loop: The robot perceives the world before acting The ability to perceive the world (perception) before doing something (action) It is the basis of any robotic system. So the researchers decided to test ChatGPT's understanding of this concept and asked it to explore an environment until it found a user-specified object. The study provides the model with features such as object detection and object distance APIs, and verifies that the code it generates successfully implements the sense-action loop. During the experimental phase, the researchers conducted additional experiments to evaluate whether ChatGPT could make real-time decisions about where the robot should go based on sensor feedback (rather than having ChatGPT generate a code loop to make the decision) these decisions). Interestingly, it happened to verify that a textual description of the camera image could be fed into each step of the conversation, and the model was able to figure out how to control the robot until it reached a specific object. PromptCraft, LLM Collaborative open source tool for robotics research Good Prompt engineering is crucial to the success of large language models, such as ChatGPT for robotic tasks. Unfortunately, Prompt is an empirical science, and there is a lack of comprehensive and accessible resources, including a mixed bag of examples to help researchers and enthusiasts in the field. To bridge this gap, the researchers introduced "PromptCraft", a collaborative open source platform where anyone can share examples of Prompt strategies for different robot categories, and the researchers published all the tools used in this study. Prompt and dialogue. In addition to the Prompt design, the research hopes to include multiple bot simulators and interfaces that allow users to test their ChatGPT-generated algorithms. As a start, the research has also released an AirSim environment integrated with ChatGPT that anyone can use to develop these ideas.ChatGPT-AirSim interface
The release of these technologies is something worth celebrating, Because this will expand the audience for robotics. Microsoft researchers believe that language-based robot control will lay the foundation for bringing robots from scientific laboratories into the lives of everyday users.
This article would like to emphasize that the output of ChatGPT is not meant to be deployed directly on a robot without careful analysis. The researchers encourage users to harness the power of simulation in order to evaluate these algorithms before potential real-life deployment and always take necessary safety precautions. The work described in this article represents only a small portion of what is possible at the intersection of large-scale language models operating in the field of robotics, and hopefully provides inspiration for more research.
Original link: https://www.microsoft.com/en-us/research/group/autonomous-systems-group-robotics/articles/chatgpt-for-robotics/
The above is the detailed content of While I was still chatting with ChatGPT, someone had already started using it to control the robot's work.. For more information, please follow other related articles on the PHP Chinese website!