Table of Contents
(The agent is the brain, the controller is the cerebellum)
Why choose drones
Home Technology peripherals AI The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

Dec 15, 2023 am 10:49 AM
ai Model

Entering the multi-modal era, large models can also control drones!

When the vision module captures the starting conditions, the "brain" of the large model will generate action instructions, and then the drone can execute them quickly and accurately

The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

Researchers from the Beijing University of Aeronautics and Astronautics' intelligent drone team led by Professor Zhou Yaoming have proposed an embodied intelligence architecture based on multi-modal large models

Currently, this structure has been used to control unmanned aerial vehicles How does this new intelligent agent perform? What are the technical details?

"Agent is the brain"The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The research team uses large models to understand multi-modal data and integrates multi-source information such as photos, sounds, and sensor data of the real physical world to make The agent can perceive the surrounding environment and perform corresponding behavioral operations

At the same time, the team proposed a set of "Agent as Cerebrum, Controller as Cerebellum"

(The agent is the brain, the controller is the cerebellum)

’s control architecture:

The intelligent agent, as the decision generator of the brain, focuses on generating high-level behaviors. Rewritten sentence: As the decision generator of the brain, the agent focuses on generating high-level behaviors

As the motion controller of the cerebellum, the main responsibility of the controller is to generate high-level behaviors (such as expected target points) Converted into low-level system commands (such as rotor speed)

Specifically, the research team believes that this achievement has three main contributions.

New system architecture applied to actual situations

The research team proposed a new system architecture that can be applied to actual robots. This architecture embodies the intelligent agent based on the multi-modal large model into the brain

, while the robot motion planner and controller are embodied into the cerebellum. The robot's perception system is analogized to human eyes, ears and other information collection The robot's actuator is analogous to actuators such as human hands.

△Figure 1 Hardware system architecture

These nodes are connected through ROS, and communicate through the subscription and publication of messages in ROS or the request and response of services. It is different from traditional end-to-end robot large model control. The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

This architecture allows the Agent to focus on the generation of high-level commands, be more intelligent for high-level tasks, and have better robustness and reliability for actual execution.

The content that needs to be rewritten is: △Figure 2 Software system architecture Rewritten content: The software system architecture is shown in Figure 2

New AgentThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

Under this architecture, the author built AeroAgent, an intelligent agent that serves as a brain.

The agent mainly consists of three parts:

An automatic plan generation module, which has multi-modal sensing and monitoring capabilities and is good at handling emergencies in standby mode. .

A multi-modal data memory module that can be used for multi-modal memory retrieval and reflection, giving the agent the ability to learn with few samples.

    An embodied intelligent action module can establish a bridge for stable control between embodied intelligence and other modules on ROS. This module provides the ability to access other nodes on ROS using operations as a bridge.
  • At the same time, in order to complete an action, multiple interactions may be required to obtain the parameters necessary to perform the action from the sensor to ensure that the agent can perform actions based on comprehensive situational awareness and the actuators it has. Stable output of specific actions

#The content that needs to be rewritten is: △ Figure 3 AeroAgent module architecture Rewritten content: △Figure 3 AeroAgent module architecture design

Bridge connecting large models and ROSThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In order to build a bridge between the embodied agent and the ROS robot system, let the Agent generate operations It can be sent to ROS correctly and stably and successfully executed by other nodes. At the same time, the information provided by other nodes can be read and understood by LMM. The team designed ROSchain -

A combination of LLMs/LMMs The bridge connecting ROS

ROSchain simplifies the integration of large models with robot sensing devices, execution units and control mechanisms through a set of modules and application program interfaces (APIs), providing a way for agents to access the ROS system. A stable middleware.

Why choose drones

The research team gave three reasons to explain why they chose drones to conduct testing and simulation of the system architecture

First of all, most of the web-scale world knowledge contained in LMMs today is from a third-person perspective. Embodied intelligence in fields such as humanoid robots is similar to the first-person perspective with humans as the subject. perspective. The camera on the drone, especially the downward-looking camera, is more like the third-person perspective (God's perspective) of organism intelligence

On the other hand, LMMs at the current stage, whether it is model deployment or API services are usually limited by computing resources, resulting in a certain delay in response.

UAV mission planning is due to its ability to hover and the ability to cope with delays, which is an obstacle to application in fields such as autonomous driving

Both of these two points have led to the current level of technological development. UAVs are suitable as pioneers to verify relevant theories and applications.

Second

, currently, in the field of industrial drones, such as wildfire rescue, agriculture, forestry and plant protection, unmanned grazing, power inspection, etc., pilots and experts cooperate with actual operations,

Intelligent tasksExecution has industrial requirements. Third

, from the perspective of future development,

Multi-agent collaborationhas obvious needs in logistics, construction, factories and other fields . In this field, drones, as embodied intelligence from a "God's perspective", are suitable for serving as the leader of the central node to allocate tasks, and other robots can be regarded as the actuators of the drones. part of the research, so this research also has future development prospects.

The team used airgen’s emulator to conduct simulation experiments, and also selected DRL and other methods as a control group. The following are the experimental results:

In the wild fire search and rescue scenario, AeroAgent achieved an excellent score of 100 points under the standardized score, with an average of 2.04 points per stepThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The agents that simply call LLM or DRL-based agents only scored 29.4 points, with an average of 0.2 per step, less than one-tenth of AeroAgent.

The content that needs to be rewritten is: Picture △No. 4-1, wildfire rescue sceneThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In the landing mission, AeroAgent also scored 97.4 overall points and an average score per step of 48.7 exceeds other models.

The content that needs to be rewritten is: △Figure 4-2 Sea apron landing sceneThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

And in the wind turbine inspection test, AeroAgent directly became The only model that can accomplish this task.

△Figure 4-3 Wind turbine inspection scenarioThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

In the navigation task, the scores of each step of AeroAgent 4.44 are DRL and pure LLM respectively. 40 times and nearly 10 times

The content that needs to be rewritten is: △Figure 4-4 Airgen simulation experimentThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The team also conducted it in a real scene The testing of the UAV system was carried out as a case study using a simple guidance experiment of trapped people as an example.

The content that needs to be rewritten is: △ Figure 5 Case experiment of guiding trapped peopleThe Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones

The team is currently based on this work, on a certain plateau The Yak Ranch conducts experiments on unmanned grazing intelligent drones to explore the possibility of its practical application. With the goal of "embodiing intelligence", it will explore the application of intelligent agents in cooperation with other robots/multi-robots.

Paper address: https://arxiv.org/abs/2311.15033

The above is the detailed content of The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to solve SQL parsing problem? Use greenlion/php-sql-parser! How to solve SQL parsing problem? Use greenlion/php-sql-parser! Apr 17, 2025 pm 09:15 PM

When developing a project that requires parsing SQL statements, I encountered a tricky problem: how to efficiently parse MySQL's SQL statements and extract the key information. After trying many methods, I found that the greenlion/php-sql-parser library can perfectly solve my needs.

How to solve the complexity of WordPress installation and update using Composer How to solve the complexity of WordPress installation and update using Composer Apr 17, 2025 pm 10:54 PM

When managing WordPress websites, you often encounter complex operations such as installation, update, and multi-site conversion. These operations are not only time-consuming, but also prone to errors, causing the website to be paralyzed. Combining the WP-CLI core command with Composer can greatly simplify these tasks, improve efficiency and reliability. This article will introduce how to use Composer to solve these problems and improve the convenience of WordPress management.

How to solve complex BelongsToThrough relationship problem in Laravel? Use Composer! How to solve complex BelongsToThrough relationship problem in Laravel? Use Composer! Apr 17, 2025 pm 09:54 PM

In Laravel development, dealing with complex model relationships has always been a challenge, especially when it comes to multi-level BelongsToThrough relationships. Recently, I encountered this problem in a project dealing with a multi-level model relationship, where traditional HasManyThrough relationships fail to meet the needs, resulting in data queries becoming complex and inefficient. After some exploration, I found the library staudenmeir/belongs-to-through, which easily installed and solved my troubles through Composer.

How to solve the complex problem of PHP geodata processing? Use Composer and GeoPHP! How to solve the complex problem of PHP geodata processing? Use Composer and GeoPHP! Apr 17, 2025 pm 08:30 PM

When developing a Geographic Information System (GIS), I encountered a difficult problem: how to efficiently handle various geographic data formats such as WKT, WKB, GeoJSON, etc. in PHP. I've tried multiple methods, but none of them can effectively solve the conversion and operational issues between these formats. Finally, I found the GeoPHP library, which easily integrates through Composer, and it completely solved my troubles.

How to solve the problem of PHP project code coverage reporting? Using php-coveralls is OK! How to solve the problem of PHP project code coverage reporting? Using php-coveralls is OK! Apr 17, 2025 pm 08:03 PM

When developing PHP projects, ensuring code coverage is an important part of ensuring code quality. However, when I was using TravisCI for continuous integration, I encountered a problem: the test coverage report was not uploaded to the Coveralls platform, resulting in the inability to monitor and improve code coverage. After some exploration, I found the tool php-coveralls, which not only solved my problem, but also greatly simplified the configuration process.

git software installation tutorial git software installation tutorial Apr 17, 2025 pm 12:06 PM

Git Software Installation Guide: Visit the official Git website to download the installer for Windows, MacOS, or Linux. Run the installer and follow the prompts. Configure Git: Set username, email, and select a text editor. For Windows users, configure the Git Bash environment.

How to solve the problem of virtual columns in Laravel model? Use stancl/virtualcolumn! How to solve the problem of virtual columns in Laravel model? Use stancl/virtualcolumn! Apr 17, 2025 pm 09:48 PM

During Laravel development, it is often necessary to add virtual columns to the model to handle complex data logic. However, adding virtual columns directly into the model can lead to complexity of database migration and maintenance. After I encountered this problem in my project, I successfully solved this problem by using the stancl/virtualcolumn library. This library not only simplifies the management of virtual columns, but also improves the maintainability and efficiency of the code.

Solve CSS prefix problem using Composer: Practice of padaliyajay/php-autoprefixer library Solve CSS prefix problem using Composer: Practice of padaliyajay/php-autoprefixer library Apr 17, 2025 pm 11:27 PM

I'm having a tricky problem when developing a front-end project: I need to manually add a browser prefix to the CSS properties to ensure compatibility. This is not only time consuming, but also error-prone. After some exploration, I discovered the padaliyajay/php-autoprefixer library, which easily solved my troubles with Composer.

See all articles