


The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones
Entering the multi-modal era, large models can also control drones!
When the vision module captures the starting conditions, the "brain" of the large model will generate action instructions, and then the drone can execute them quickly and accurately
Researchers from the Beijing University of Aeronautics and Astronautics' intelligent drone team led by Professor Zhou Yaoming have proposed an embodied intelligence architecture based on multi-modal large models
Currently, this structure has been used to control unmanned aerial vehicles How does this new intelligent agent perform? What are the technical details?
"Agent is the brain"
(The agent is the brain, the controller is the cerebellum)
’s control architecture: The intelligent agent, as the decision generator of the brain, focuses on generating high-level behaviors. Rewritten sentence: As the decision generator of the brain, the agent focuses on generating high-level behaviorsAs the motion controller of the cerebellum, the main responsibility of the controller is to generate high-level behaviors (such as expected target points) Converted into low-level system commands (such as rotor speed)
Specifically, the research team believes that this achievement has three main contributions. New system architecture applied to actual situationsThe research team proposed a new system architecture that can be applied to actual robots. This architecture embodies the intelligent agent based on the multi-modal large model into the brain, while the robot motion planner and controller are embodied into the cerebellum. The robot's perception system is analogized to human eyes, ears and other information collection The robot's actuator is analogous to actuators such as human hands.△Figure 1 Hardware system architecture
These nodes are connected through ROS, and communicate through the subscription and publication of messages in ROS or the request and response of services. It is different from traditional end-to-end robot large model control.
The content that needs to be rewritten is: △Figure 2 Software system architecture Rewritten content: The software system architecture is shown in Figure 2
New Agent
An automatic plan generation module, which has multi-modal sensing and monitoring capabilities and is good at handling emergencies in standby mode. .
A multi-modal data memory module that can be used for multi-modal memory retrieval and reflection, giving the agent the ability to learn with few samples.
- An embodied intelligent action module can establish a bridge for stable control between embodied intelligence and other modules on ROS. This module provides the ability to access other nodes on ROS using operations as a bridge.
- At the same time, in order to complete an action, multiple interactions may be required to obtain the parameters necessary to perform the action from the sensor to ensure that the agent can perform actions based on comprehensive situational awareness and the actuators it has. Stable output of specific actions
#The content that needs to be rewritten is: △ Figure 3 AeroAgent module architecture Rewritten content: △Figure 3 AeroAgent module architecture design
Bridge connecting large models and ROS
Why choose drones
The research team gave three reasons to explain why they chose drones to conduct testing and simulation of the system architecture
First of all, most of the web-scale world knowledge contained in LMMs today is from a third-person perspective. Embodied intelligence in fields such as humanoid robots is similar to the first-person perspective with humans as the subject. perspective. The camera on the drone, especially the downward-looking camera, is more like the third-person perspective (God's perspective) of organism intelligence
On the other hand, LMMs at the current stage, whether it is model deployment or API services are usually limited by computing resources, resulting in a certain delay in response. UAV mission planning is due to its ability to hover and the ability to cope with delays, which is an obstacle to application in fields such as autonomous drivingBoth of these two points have led to the current level of technological development. UAVs are suitable as pioneers to verify relevant theories and applications.Second
, currently, in the field of industrial drones, such as wildfire rescue, agriculture, forestry and plant protection, unmanned grazing, power inspection, etc., pilots and experts cooperate with actual operations,Intelligent tasksExecution has industrial requirements. Third
, from the perspective of future development,Multi-agent collaborationhas obvious needs in logistics, construction, factories and other fields . In this field, drones, as embodied intelligence from a "God's perspective", are suitable for serving as the leader of the central node to allocate tasks, and other robots can be regarded as the actuators of the drones. part of the research, so this research also has future development prospects.
The team used airgen’s emulator to conduct simulation experiments, and also selected DRL and other methods as a control group. The following are the experimental results:In the wild fire search and rescue scenario, AeroAgent achieved an excellent score of 100 points under the standardized score, with an average of 2.04 points per step
The content that needs to be rewritten is: Picture △No. 4-1, wildfire rescue scene
The content that needs to be rewritten is: △Figure 4-2 Sea apron landing scene
△Figure 4-3 Wind turbine inspection scenario
The content that needs to be rewritten is: △Figure 4-4 Airgen simulation experiment
The content that needs to be rewritten is: △ Figure 5 Case experiment of guiding trapped people
Paper address: https://arxiv.org/abs/2311.15033
The above is the detailed content of The Beihang University team proposes a new architecture of embodied intelligence to realize the control of large drones. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



When developing a project that requires parsing SQL statements, I encountered a tricky problem: how to efficiently parse MySQL's SQL statements and extract the key information. After trying many methods, I found that the greenlion/php-sql-parser library can perfectly solve my needs.

When managing WordPress websites, you often encounter complex operations such as installation, update, and multi-site conversion. These operations are not only time-consuming, but also prone to errors, causing the website to be paralyzed. Combining the WP-CLI core command with Composer can greatly simplify these tasks, improve efficiency and reliability. This article will introduce how to use Composer to solve these problems and improve the convenience of WordPress management.

In Laravel development, dealing with complex model relationships has always been a challenge, especially when it comes to multi-level BelongsToThrough relationships. Recently, I encountered this problem in a project dealing with a multi-level model relationship, where traditional HasManyThrough relationships fail to meet the needs, resulting in data queries becoming complex and inefficient. After some exploration, I found the library staudenmeir/belongs-to-through, which easily installed and solved my troubles through Composer.

When developing a Geographic Information System (GIS), I encountered a difficult problem: how to efficiently handle various geographic data formats such as WKT, WKB, GeoJSON, etc. in PHP. I've tried multiple methods, but none of them can effectively solve the conversion and operational issues between these formats. Finally, I found the GeoPHP library, which easily integrates through Composer, and it completely solved my troubles.

When developing PHP projects, ensuring code coverage is an important part of ensuring code quality. However, when I was using TravisCI for continuous integration, I encountered a problem: the test coverage report was not uploaded to the Coveralls platform, resulting in the inability to monitor and improve code coverage. After some exploration, I found the tool php-coveralls, which not only solved my problem, but also greatly simplified the configuration process.

Git Software Installation Guide: Visit the official Git website to download the installer for Windows, MacOS, or Linux. Run the installer and follow the prompts. Configure Git: Set username, email, and select a text editor. For Windows users, configure the Git Bash environment.

During Laravel development, it is often necessary to add virtual columns to the model to handle complex data logic. However, adding virtual columns directly into the model can lead to complexity of database migration and maintenance. After I encountered this problem in my project, I successfully solved this problem by using the stancl/virtualcolumn library. This library not only simplifies the management of virtual columns, but also improves the maintainability and efficiency of the code.

I'm having a tricky problem when developing a front-end project: I need to manually add a browser prefix to the CSS properties to ensure compatibility. This is not only time consuming, but also error-prone. After some exploration, I discovered the padaliyajay/php-autoprefixer library, which easily solved my troubles with Composer.
