Table of Contents
BEVPerception Survey The latest literature review research mainly includes three parts-
Datasets suitable for BEV sensing models
Toolbox - BEV perception toolbox
Summary
Home Technology peripherals AI From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

Apr 13, 2023 pm 10:31 PM
Autopilot

From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

#BEV What exactly is perception? What are the aspects of BEV perception that both the academic and industrial circles of autonomous driving are paying attention to? This article will reveal the answer for you.

In the field of autonomous driving, letting perception models learn powerful bird's-eye view (BEV) representations is a trend and has attracted widespread attention from industry and academia. . Compared with most previous models in the field of autonomous driving that are based on performing tasks such as detection, segmentation, and tracking in the front view or perspective view, the Bird's Eye View (BEV) representation allows the model to better identify occluded vehicles and has Facilitates the development and deployment of subsequent modules (e.g. planning, control).

It can be seen that BEV perception research has a huge potential impact on the field of autonomous driving and deserves long-term attention and investment from academia and industry. So what exactly is BEV perception? What are the contents of BEV perception that academic and industrial leaders in autonomous driving are paying attention to? This article will reveal the answer for you through BEVPerception Survey.

BEVPerception Survey is a collaboration between the Shanghai Artificial Intelligence Laboratory Autonomous Driving OpenDriveLab team and SenseTime Research Institute The practical tool presentation method of the collaborative paper "Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe" is divided into the latest literature research based on BEVPercption and PyTorch-based Two major sections of the open source BEV perception toolbox.

From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

    ##Paper address: https://arxiv.org/abs/2209.05324
  • Project address: https://github.com/OpenPerceptionX/BEVPerception-Survey-Recipe
  • Summary interpretation , Technical Interpretation

BEVPerception Survey The latest literature review research mainly includes three parts-

BEV camera, BEV lidar and BEV fusion. BEV Camera represents a vision-only or vision-centric algorithm for 3D object detection or segmentation from multiple surrounding cameras; BEV LiDAR describes the detection or segmentation task of point cloud input; BEV Fusion describes the detection or segmentation task from multiple sensors Input fusion mechanisms such as cameras, lidar, global navigation systems, odometry, HD maps, CAN bus, etc.

BEV Perception Toolbox is a platform for 3D object detection based on BEV cameras and is used in Waymo data Jishang provides an experimental platform that can conduct manual tutorials and experiments on small-scale data sets.

From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

Figure 1: BEVPerception Survey Framework

Specifically, BEV Camera represents an algorithm for 3D object detection or segmentation from multiple surrounding cameras; BEV lidar represents using point clouds as input to complete detection or segmentation tasks; BEV fusion uses the output of multiple sensors as input, such as cameras, LiDAR, GNSS, odometry, HD-Map, CAN-bus, etc.

BEVPercption Literature Review Research

BEV Camera

BEV camera perception includes 2D feature extraction It consists of three parts: converter, view transformer and 3D decoder. The figure below shows the BEV camera perception flow chart. In view transformation, there are two ways to encode 3D information - one is to predict depth information from 2D features; the other is to sample 2D features from 3D space.


Figure 2: BEV camera perception flow chart

For 2D feature extractor, There is a lot of experience in 2D perception tasks that can be learned from in 3D perception tasks, such as the form of main intervention training.

The view conversion module is a very different aspect from the 2D perception system. As shown in the figure above, there are generally two ways to perform view transformation: one is the transformation from 3D space to 2D space, and the other is the transformation from 2D space to 3D space. These two transformation methods are either used in 3D space prior knowledge of physics in the system or utilizing additional 3D information for supervision. It is worth noting that not all 3D perception methods have view transformation modules. For example, some methods detect objects in 3D space directly from features in 2D space.

3D decoder Receives features in 2D/3D space and outputs 3D perception results. Most 3D decoders are designed from LiDAR-based perception models. These methods perform detection in BEV space, but there are still some 3D decoders that exploit features in 2D space and directly regress the localization of 3D objects.

BEV Lidar

The common process of BEV lidar perception is mainly to combine the two branches to convert point cloud data Convert to BEV representation. The figure below shows the BEV lidar sensing flow chart. The upper branch extracts point cloud features in 3D space to provide more accurate detection results. The lower branch extracts BEV features in 2D space, providing a more efficient network. In addition to point-based methods that operate on raw point clouds, voxel-based methods voxelize points into discrete grids, providing a more efficient representation by discretizing continuous 3D coordinates. Based on discrete voxel representation, 3D convolution or 3D sparse convolution can be used to extract point cloud features.

From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

Figure 3: BEV lidar sensing flow chart

BEV Fusion

The BEV perception fusion algorithm has two methods: PV perception and BEV perception, which are suitable for academia and industry. The figure below shows a comparison of the PV sensing and BEV sensing flow charts. The main difference between the two is the 2D to 3D conversion and fusion module. In the PV-aware flowchart, the results of different algorithms are first converted into 3D space and then fused using some prior knowledge or manually designed rules. In the BEV perception flow chart, the PV feature map will be converted to the BEV perspective, and then fused in the BEV space to obtain the final result, thus maximizing the retention of the original feature information and avoiding excessive manual design.

From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

Figure 4: PV sensing (left) and BEV sensing (right) flow chart

Datasets suitable for BEV sensing models

There are many data sets for BEV sensing tasks. Typically a dataset consists of various scenes, and each scene has a different length in different datasets. The following table summarizes the commonly used data sets in the academic community. We can see that the Waymo dataset has more diverse scenes and richer 3D detection box annotations than other datasets.

From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

Table 1: List of BEV sensing data sets

However, currently the academic community There is no publicly available software for the BEV perception tasks developed by Waymo. Therefore, we chose to develop based on the Waymo data set, hoping to promote the development of BEV sensing tasks on the Waymo data set.

Toolbox - BEV perception toolbox

BEVFormer is a commonly used BEV perception method. It uses a spatiotemporal transformer to convert the features extracted by the backbone network from multi-view input into BEV features, and then The BEV features are input into the detection head to obtain the final detection result. BEVFormer has two features. It has precise conversion from 2D image features to 3D features and can apply the BEV features it extracts to different detection heads. We further improved BEVFormer's view conversion quality and final detection performance through a series of methods.

After winning CVPR 2022 Waymo Challenge first place with BEVFormer, we launched Toolbox - BEV Perception Toolbox, by providing a set of easy-to-use Waymo Open Dataset data processing tools, integrates a series of methods that can significantly improve model performance (including but not limited to data enhancement, detection heads, loss functions, Model integration, etc.), and is compatible with open source frameworks widely used in the field, such as mmdetection3d and detectron2. Compared with the basic Waymo data set, the BEV perception toolbox optimizes and improves the usage skills for use by different types of developers. The figure below shows an example of using the BEV awareness toolbox based on the Waymo dataset.

From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception

Figure 5: Toolbox usage example based on Waymo data set

Summary

  • BEVPerception Survey summarizes the overall situation of BEV perception technology research in recent years, including high-level concept elaboration and more in-depth detailed discussions. A comprehensive analysis of the literature related to BEV sensing covers core issues such as depth estimation, view transformation, sensor fusion, domain adaptation, etc., and provides a more in-depth explanation of the application of BEV sensing in industrial systems.
  • In addition to theoretical contributions, BEVPerception Survey also provides a very practical toolbox for improving the performance of camera-based 3D bird's-eye view (BEV) object detection, including a series of training data Enhancement strategies, efficient encoder design, loss function design, test data enhancement and model integration strategies, etc., as well as the implementation of these techniques on the Waymo data set. We hope to help more researchers realize “use and take” and provide more convenience for researchers in the autonomous driving industry.

We hope that the BEVPerception Survey will not only help users easily use high-performance BEV perception models, but also become a good starting point for novices to get started with BEV perception models. We are committed to breaking through the boundaries of research and development in the field of autonomous driving, and look forward to sharing our views and exchanging discussions with the academic community to continuously explore the application potential of autonomous driving-related research in the real world.

The above is the detailed content of From papers to code, from cutting-edge research to industrial implementation, comprehensively understand BEV perception. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Why is Gaussian Splatting so popular in autonomous driving that NeRF is starting to be abandoned? Why is Gaussian Splatting so popular in autonomous driving that NeRF is starting to be abandoned? Jan 17, 2024 pm 02:57 PM

Written above & the author’s personal understanding Three-dimensional Gaussiansplatting (3DGS) is a transformative technology that has emerged in the fields of explicit radiation fields and computer graphics in recent years. This innovative method is characterized by the use of millions of 3D Gaussians, which is very different from the neural radiation field (NeRF) method, which mainly uses an implicit coordinate-based model to map spatial coordinates to pixel values. With its explicit scene representation and differentiable rendering algorithms, 3DGS not only guarantees real-time rendering capabilities, but also introduces an unprecedented level of control and scene editing. This positions 3DGS as a potential game-changer for next-generation 3D reconstruction and representation. To this end, we provide a systematic overview of the latest developments and concerns in the field of 3DGS for the first time.

How to solve the long tail problem in autonomous driving scenarios? How to solve the long tail problem in autonomous driving scenarios? Jun 02, 2024 pm 02:44 PM

Yesterday during the interview, I was asked whether I had done any long-tail related questions, so I thought I would give a brief summary. The long-tail problem of autonomous driving refers to edge cases in autonomous vehicles, that is, possible scenarios with a low probability of occurrence. The perceived long-tail problem is one of the main reasons currently limiting the operational design domain of single-vehicle intelligent autonomous vehicles. The underlying architecture and most technical issues of autonomous driving have been solved, and the remaining 5% of long-tail problems have gradually become the key to restricting the development of autonomous driving. These problems include a variety of fragmented scenarios, extreme situations, and unpredictable human behavior. The "long tail" of edge scenarios in autonomous driving refers to edge cases in autonomous vehicles (AVs). Edge cases are possible scenarios with a low probability of occurrence. these rare events

Choose camera or lidar? A recent review on achieving robust 3D object detection Choose camera or lidar? A recent review on achieving robust 3D object detection Jan 26, 2024 am 11:18 AM

0.Written in front&& Personal understanding that autonomous driving systems rely on advanced perception, decision-making and control technologies, by using various sensors (such as cameras, lidar, radar, etc.) to perceive the surrounding environment, and using algorithms and models for real-time analysis and decision-making. This enables vehicles to recognize road signs, detect and track other vehicles, predict pedestrian behavior, etc., thereby safely operating and adapting to complex traffic environments. This technology is currently attracting widespread attention and is considered an important development area in the future of transportation. one. But what makes autonomous driving difficult is figuring out how to make the car understand what's going on around it. This requires that the three-dimensional object detection algorithm in the autonomous driving system can accurately perceive and describe objects in the surrounding environment, including their locations,

Have you really mastered coordinate system conversion? Multi-sensor issues that are inseparable from autonomous driving Have you really mastered coordinate system conversion? Multi-sensor issues that are inseparable from autonomous driving Oct 12, 2023 am 11:21 AM

The first pilot and key article mainly introduces several commonly used coordinate systems in autonomous driving technology, and how to complete the correlation and conversion between them, and finally build a unified environment model. The focus here is to understand the conversion from vehicle to camera rigid body (external parameters), camera to image conversion (internal parameters), and image to pixel unit conversion. The conversion from 3D to 2D will have corresponding distortion, translation, etc. Key points: The vehicle coordinate system and the camera body coordinate system need to be rewritten: the plane coordinate system and the pixel coordinate system. Difficulty: image distortion must be considered. Both de-distortion and distortion addition are compensated on the image plane. 2. Introduction There are four vision systems in total. Coordinate system: pixel plane coordinate system (u, v), image coordinate system (x, y), camera coordinate system () and world coordinate system (). There is a relationship between each coordinate system,

This article is enough for you to read about autonomous driving and trajectory prediction! This article is enough for you to read about autonomous driving and trajectory prediction! Feb 28, 2024 pm 07:20 PM

Trajectory prediction plays an important role in autonomous driving. Autonomous driving trajectory prediction refers to predicting the future driving trajectory of the vehicle by analyzing various data during the vehicle's driving process. As the core module of autonomous driving, the quality of trajectory prediction is crucial to downstream planning control. The trajectory prediction task has a rich technology stack and requires familiarity with autonomous driving dynamic/static perception, high-precision maps, lane lines, neural network architecture (CNN&GNN&Transformer) skills, etc. It is very difficult to get started! Many fans hope to get started with trajectory prediction as soon as possible and avoid pitfalls. Today I will take stock of some common problems and introductory learning methods for trajectory prediction! Introductory related knowledge 1. Are the preview papers in order? A: Look at the survey first, p

SIMPL: A simple and efficient multi-agent motion prediction benchmark for autonomous driving SIMPL: A simple and efficient multi-agent motion prediction benchmark for autonomous driving Feb 20, 2024 am 11:48 AM

Original title: SIMPL: ASimpleandEfficientMulti-agentMotionPredictionBaselineforAutonomousDriving Paper link: https://arxiv.org/pdf/2402.02519.pdf Code link: https://github.com/HKUST-Aerial-Robotics/SIMPL Author unit: Hong Kong University of Science and Technology DJI Paper idea: This paper proposes a simple and efficient motion prediction baseline (SIMPL) for autonomous vehicles. Compared with traditional agent-cent

nuScenes' latest SOTA | SparseAD: Sparse query helps efficient end-to-end autonomous driving! nuScenes' latest SOTA | SparseAD: Sparse query helps efficient end-to-end autonomous driving! Apr 17, 2024 pm 06:22 PM

Written in front & starting point The end-to-end paradigm uses a unified framework to achieve multi-tasking in autonomous driving systems. Despite the simplicity and clarity of this paradigm, the performance of end-to-end autonomous driving methods on subtasks still lags far behind single-task methods. At the same time, the dense bird's-eye view (BEV) features widely used in previous end-to-end methods make it difficult to scale to more modalities or tasks. A sparse search-centric end-to-end autonomous driving paradigm (SparseAD) is proposed here, in which sparse search fully represents the entire driving scenario, including space, time, and tasks, without any dense BEV representation. Specifically, a unified sparse architecture is designed for task awareness including detection, tracking, and online mapping. In addition, heavy

Let's talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving? Let's talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving? Apr 15, 2024 pm 04:13 PM

In the past month, due to some well-known reasons, I have had very intensive exchanges with various teachers and classmates in the industry. An inevitable topic in the exchange is naturally end-to-end and the popular Tesla FSDV12. I would like to take this opportunity to sort out some of my thoughts and opinions at this moment for your reference and discussion. How to define an end-to-end autonomous driving system, and what problems should be expected to be solved end-to-end? According to the most traditional definition, an end-to-end system refers to a system that inputs raw information from sensors and directly outputs variables of concern to the task. For example, in image recognition, CNN can be called end-to-end compared to the traditional feature extractor + classifier method. In autonomous driving tasks, input data from various sensors (camera/LiDAR

See all articles