Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!-AI-php.cn

Table of Contents

Written before

Review of related work

Overview of P-MapNet

3.1 SDMap Prior module

3.2. HDMap Prior module

4. Experiment

4.1 Datasets and indicators

4.2 Results

Home

Technology peripherals

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Mar 28, 2024 pm 02:36 PM

Autopilot HD map

Written before

One of the algorithms used by the current autonomous driving system to get rid of its dependence on high-precision maps is to take advantage of the fact that the perception performance at long distances is still poor. Still worse. To this end, we propose P-MapNet, where the “P” focuses on fusing map priors to improve model performance. Specifically, we exploit the prior information in SDMap and HDMap: on the one hand, we extract weakly aligned SDMap data from OpenStreetMap and encode it into independent terms to support the input. There is a problem of weak alignment between strictly modified input and the actual HD Map. Our structure based on the Cross-attention mechanism can adaptively focus on the SDMap skeleton and bring significant performance improvements; on the other hand, we propose a method using MAE to The refine module captures the prior distribution of HDMap. This module helps generate a distribution that is more consistent with the actual map and helps reduce the effects of occlusion, artifacts, etc. We conduct extensive experimental validation on nuScenes and Argoverse2 datasets.

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points! Figure 1

In summary, our contributions are as follows:

Our SDMap advanced can improve the performance of online map generation, including rasterization (up to Improved map performance by 18.73 mIoU) and quantized (up to 8.50 mAP improved).

(2) Our HDMap prior can improve the map awareness index by up to 6.34%.

(3) P-MapNet can switch to different inference modes to trade off accuracy and efficiency.

P-MapNet is a long-distance HD Map generation solution that can bring greater improvements to farther sensing ranges. Our code and model have been publicly released at https://jike5.github.io/P-MapNet/.

(1)Online map generation

The production of HD Map mainly includes SLAM mapping, automatic Annotation, manual annotation and other steps. This results in high cost and limited freshness of HD Map. Therefore, online map generation is crucial for autonomous driving systems. HDMapNet expresses map elements through gridding and uses pixel-wise prediction and post-processing methods to obtain vectorized prediction results. Some recent methods, such as MapTR, PivotNet, Streammapnet, etc., implement end-to-end vectorized prediction based on the Transformer architecture. However, these methods only use sensor input, and their performance is still limited in complex environments such as occlusion and extreme weather.

(2)Long-distance map perception

In order to make the results generated by online maps better used by downstream modules, some research attempts to further expand the scope of map perception . SuperFusion[7] achieves forward 90m long-distance prediction by fusing lidar and cameras and using depth-aware BEV transformation. NeuralMapPrior[8] enhances the quality of current online observations and expands the scope of perception by maintaining and updating global neural map priors. [6] obtains BEV features by aggregating satellite images and vehicle sensor data, and further predicts them. MV-Map focuses on offline, long-distance map generation. This method optimizes BEV features by aggregating all associated frame features and using neural radiation fields.

Overview of P-MapNet

The overall framework is shown in Figure 2.

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points! Figure 2

Input: The system input is point cloud: , surround camera:, among which is the number of surround cameras. Common HDMap generation tasks (such as HDMapNet) can be defined as:

where represents feature extraction, represents segmentation head, is HDMap forecast result.

The P-MapNet we proposed combines SD Map and HD Map priors. This new task ( setting) can be expressed as:

where, represents SDMap prior, represents the refinement module mentioned in this article. The module learns the HD Map distribution prior through pre-training. Similarly, when only using SDMap prior, you get -only setting:

Output: For map generation tasks, there are usually two map representations: Rasterization and vectorization. In the research of this article, since the two a priori modules designed in this article are more suitable for rasterized output, we mainly focus on rasterized representation.

3.1 SDMap Prior module

SDMap data generation

This article is based on nuScenes and Argoverse2 data sets for research, using OpenStreetMap data The SD Map data of the corresponding area of the above data set is generated, and the coordinate system is transformed through the vehicle GPS to obtain the SD Map of the corresponding area.

BEV Query

As shown in Figure 2, we first perform feature extraction and perspective conversion on the image data and feature extraction on the point cloud to obtain BEV features. Then the BEV features are downsampled through the convolutional network to obtain the new BEV features:, and the feature map is flattened to obtain the BEV Query.

SD Map prior fusion

For SD Map data, after feature extraction through the convolutional network, the obtained features are compared with BEV Query Cross-attention mechanism:

The BEV features obtained after the cross-attention mechanism can obtain the initial prediction of map elements through the segmentation head.

3.2. HDMap Prior module

directly uses the rasterized HD Map as the input of the original MAE, and the MAE will be trained through MSE Loss, resulting in the inability to use refinement module. So in this article, we replace the output of MAE with our segmentation head. In order to make the predicted map elements have continuity and authenticity (closer to the distribution of the actual HD Map), we use a pre-trained MAE module for refinement. Training this module consists of two steps: the first step is to use self-supervised learning to train the MAE module to learn the distribution of HD Map, and the second step is to fine-tune all modules of the network by using the weights obtained in the first step as initial weights.

In the first step of pre-training, the real HD Map obtained from the data set is passed through a random mask and used as network input , and the training goal is to complete the HD Map:

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!

In the second step of fine-tune, use the pre-trained weights of the first step as the initial weights. The complete network is:

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!

4. Experiment

4.1 Datasets and indicators

We conducted evaluation on two mainstream data sets :nuScenes and Argoverse2. In order to prove the effectiveness of our proposed method at long distances, we set three different detection distances:, , . Among them, the resolution of BEV Grid in the range is 0.15m, and the resolution in the other two ranges is 0.3m. We use the mIOU metric to evaluate rasterized prediction results and mAP to evaluate vectorized prediction results. In order to evaluate the authenticity of the map, we also use the LPIPS metric as the map awareness metric.

4.2 Results

Comparison with SOTA results: We compare the proposed method with the current SOTA method in short distance (60m × 30m) and long distance (90m × 30m) ) to compare the map generation results. As shown in Table II, our method shows superior performance compared to existing vision-only and multi-modal (RGB LiDAR) methods.

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!

We performed a performance comparison with HDMapNet [14] at different distances and using different sensor modes, and the results are summarized in Table I and Table III. Our method achieves 13.4% improvement on mIOU in the range of 240m × 60m. As the perceived distance exceeds or even exceeds the sensor detection range, the effectiveness of the SDMap prior becomes more significant, thus validating the efficacy of the SDMap prior. Finally, we leverage the HD map prior to further bring performance improvements by refining the initial prediction results to make them more realistic and eliminate false results.

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!

HDMap a priori perceptual metric. The HDMap prior module maps the network’s initial predictions onto the HD map’s distribution, making it more realistic. In order to evaluate the authenticity of the HDMap prior module output, we used the perceptual metric LPIPS (the lower the value, the better the performance) for evaluation. As shown in Table IV, the LPIPS indicator in the setting has a greater improvement than that in the -only setting.

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!

Visualization:

Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!

The above is the detailed content of Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7598

CakePHP Tutorial

1386

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

123

Related knowledge

Why is Gaussian Splatting so popular in autonomous driving that NeRF is starting to be abandoned? Jan 17, 2024 pm 02:57 PM

Written above & the author’s personal understanding Three-dimensional Gaussiansplatting (3DGS) is a transformative technology that has emerged in the fields of explicit radiation fields and computer graphics in recent years. This innovative method is characterized by the use of millions of 3D Gaussians, which is very different from the neural radiation field (NeRF) method, which mainly uses an implicit coordinate-based model to map spatial coordinates to pixel values. With its explicit scene representation and differentiable rendering algorithms, 3DGS not only guarantees real-time rendering capabilities, but also introduces an unprecedented level of control and scene editing. This positions 3DGS as a potential game-changer for next-generation 3D reconstruction and representation. To this end, we provide a systematic overview of the latest developments and concerns in the field of 3DGS for the first time.

How to solve the long tail problem in autonomous driving scenarios? Jun 02, 2024 pm 02:44 PM

Yesterday during the interview, I was asked whether I had done any long-tail related questions, so I thought I would give a brief summary. The long-tail problem of autonomous driving refers to edge cases in autonomous vehicles, that is, possible scenarios with a low probability of occurrence. The perceived long-tail problem is one of the main reasons currently limiting the operational design domain of single-vehicle intelligent autonomous vehicles. The underlying architecture and most technical issues of autonomous driving have been solved, and the remaining 5% of long-tail problems have gradually become the key to restricting the development of autonomous driving. These problems include a variety of fragmented scenarios, extreme situations, and unpredictable human behavior. The "long tail" of edge scenarios in autonomous driving refers to edge cases in autonomous vehicles (AVs). Edge cases are possible scenarios with a low probability of occurrence. these rare events

Choose camera or lidar? A recent review on achieving robust 3D object detection Jan 26, 2024 am 11:18 AM

0.Written in front&& Personal understanding that autonomous driving systems rely on advanced perception, decision-making and control technologies, by using various sensors (such as cameras, lidar, radar, etc.) to perceive the surrounding environment, and using algorithms and models for real-time analysis and decision-making. This enables vehicles to recognize road signs, detect and track other vehicles, predict pedestrian behavior, etc., thereby safely operating and adapting to complex traffic environments. This technology is currently attracting widespread attention and is considered an important development area in the future of transportation. one. But what makes autonomous driving difficult is figuring out how to make the car understand what's going on around it. This requires that the three-dimensional object detection algorithm in the autonomous driving system can accurately perceive and describe objects in the surrounding environment, including their locations,

Have you really mastered coordinate system conversion? Multi-sensor issues that are inseparable from autonomous driving Oct 12, 2023 am 11:21 AM

The first pilot and key article mainly introduces several commonly used coordinate systems in autonomous driving technology, and how to complete the correlation and conversion between them, and finally build a unified environment model. The focus here is to understand the conversion from vehicle to camera rigid body (external parameters), camera to image conversion (internal parameters), and image to pixel unit conversion. The conversion from 3D to 2D will have corresponding distortion, translation, etc. Key points: The vehicle coordinate system and the camera body coordinate system need to be rewritten: the plane coordinate system and the pixel coordinate system. Difficulty: image distortion must be considered. Both de-distortion and distortion addition are compensated on the image plane. 2. Introduction There are four vision systems in total. Coordinate system: pixel plane coordinate system (u, v), image coordinate system (x, y), camera coordinate system () and world coordinate system (). There is a relationship between each coordinate system,

This article is enough for you to read about autonomous driving and trajectory prediction! Feb 28, 2024 pm 07:20 PM

Trajectory prediction plays an important role in autonomous driving. Autonomous driving trajectory prediction refers to predicting the future driving trajectory of the vehicle by analyzing various data during the vehicle's driving process. As the core module of autonomous driving, the quality of trajectory prediction is crucial to downstream planning control. The trajectory prediction task has a rich technology stack and requires familiarity with autonomous driving dynamic/static perception, high-precision maps, lane lines, neural network architecture (CNN&GNN&Transformer) skills, etc. It is very difficult to get started! Many fans hope to get started with trajectory prediction as soon as possible and avoid pitfalls. Today I will take stock of some common problems and introductory learning methods for trajectory prediction! Introductory related knowledge 1. Are the preview papers in order? A: Look at the survey first, p

Let's talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving? Apr 15, 2024 pm 04:13 PM

In the past month, due to some well-known reasons, I have had very intensive exchanges with various teachers and classmates in the industry. An inevitable topic in the exchange is naturally end-to-end and the popular Tesla FSDV12. I would like to take this opportunity to sort out some of my thoughts and opinions at this moment for your reference and discussion. How to define an end-to-end autonomous driving system, and what problems should be expected to be solved end-to-end? According to the most traditional definition, an end-to-end system refers to a system that inputs raw information from sensors and directly outputs variables of concern to the task. For example, in image recognition, CNN can be called end-to-end compared to the traditional feature extractor + classifier method. In autonomous driving tasks, input data from various sensors (camera/LiDAR

SIMPL: A simple and efficient multi-agent motion prediction benchmark for autonomous driving Feb 20, 2024 am 11:48 AM

Original title: SIMPL: ASimpleandEfficientMulti-agentMotionPredictionBaselineforAutonomousDriving Paper link: https://arxiv.org/pdf/2402.02519.pdf Code link: https://github.com/HKUST-Aerial-Robotics/SIMPL Author unit: Hong Kong University of Science and Technology DJI Paper idea: This paper proposes a simple and efficient motion prediction baseline (SIMPL) for autonomous vehicles. Compared with traditional agent-cent

nuScenes' latest SOTA | SparseAD: Sparse query helps efficient end-to-end autonomous driving! Apr 17, 2024 pm 06:22 PM

Written in front & starting point The end-to-end paradigm uses a unified framework to achieve multi-tasking in autonomous driving systems. Despite the simplicity and clarity of this paradigm, the performance of end-to-end autonomous driving methods on subtasks still lags far behind single-task methods. At the same time, the dense bird's-eye view (BEV) features widely used in previous end-to-end methods make it difficult to scale to more modalities or tasks. A sparse search-centric end-to-end autonomous driving paradigm (SparseAD) is proposed here, in which sparse search fully represents the entire driving scenario, including space, time, and tasks, without any dense BEV representation. Specifically, a unified sparse architecture is designed for task awareness including detection, tracking, and online mapping. In addition, heavy

See all articles