OmniDrive: A framework for aligning large models with 3D driving tasks-AI-php.cn

Table of Contents

Network structure

Home

Technology peripherals

OmniDrive: A framework for aligning large models with 3D driving tasks

May 06, 2024 pm 03:16 PM

git 3d Autopilot

Start with a novel 3D MLLM architecture that uses sparse queries to lift and compress visual representations into 3D, which are then fed into the LLM.

Title: OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception Reasoning and Planning

Author affiliation: Beijing Institute of Technology, NVIDIA, Huazhong University of Science and Technology

Open source Address: GitHub - NVlabs/OmniDrive

The development of multimodal large language models (MLLMs) has led to growing interest in LLM-based autonomous driving, leveraging their powerful inference capabilities. Leveraging the powerful reasoning capabilities of MLLMs to improve planning behavior is challenging because they require full 3D situation awareness beyond 2D reasoning. To address this challenge, this work proposes OmniDrive, a comprehensive framework for robust alignment between agent models and 3D driving tasks. The framework starts with a novel 3D MLLM architecture that uses sparse queries to lift and compress observation representations into 3D, which are then fed into the LLM. This query-based representation allows us to jointly encode dynamic objects and static map elements (e.g., traffic roads), providing a concise world model for perception-action alignment in 3D. We further propose a new benchmark that includes comprehensive visual question answering (VQA) tasks including scene description, traffic rules, 3D grounding, counterfactual reasoning, decision making, and planning. Extensive research demonstrates OmniDrive's superior reasoning and planning capabilities in complex 3D scenes.

Network structure

OmniDrive: 一个关于大模型与3D驾驶任务对齐的框架

##Experimental results

OmniDrive: 一个关于大模型与3D驾驶任务对齐的框架

The above is the detailed content of OmniDrive: A framework for aligning large models with 3D driving tasks. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn