


Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!
Written before
One of the algorithms used by the current autonomous driving system to get rid of its dependence on high-precision maps is to take advantage of the fact that the perception performance at long distances is still poor. Still worse. To this end, we propose P-MapNet, where the “P” focuses on fusing map priors to improve model performance. Specifically, we exploit the prior information in SDMap and HDMap: on the one hand, we extract weakly aligned SDMap data from OpenStreetMap and encode it into independent terms to support the input. There is a problem of weak alignment between strictly modified input and the actual HD Map. Our structure based on the Cross-attention mechanism can adaptively focus on the SDMap skeleton and bring significant performance improvements; on the other hand, we propose a method using MAE to The refine module captures the prior distribution of HDMap. This module helps generate a distribution that is more consistent with the actual map and helps reduce the effects of occlusion, artifacts, etc. We conduct extensive experimental validation on nuScenes and Argoverse2 datasets.
Figure 1
In summary, our contributions are as follows:
Our SDMap advanced can improve the performance of online map generation, including rasterization (up to Improved map performance by 18.73 mIoU) and quantized (up to 8.50 mAP improved).
(2) Our HDMap prior can improve the map awareness index by up to 6.34%.
(3) P-MapNet can switch to different inference modes to trade off accuracy and efficiency.
P-MapNet is a long-distance HD Map generation solution that can bring greater improvements to farther sensing ranges. Our code and model have been publicly released at https://jike5.github.io/P-MapNet/.
Review of related work
(1)Online map generation
The production of HD Map mainly includes SLAM mapping, automatic Annotation, manual annotation and other steps. This results in high cost and limited freshness of HD Map. Therefore, online map generation is crucial for autonomous driving systems. HDMapNet expresses map elements through gridding and uses pixel-wise prediction and post-processing methods to obtain vectorized prediction results. Some recent methods, such as MapTR, PivotNet, Streammapnet, etc., implement end-to-end vectorized prediction based on the Transformer architecture. However, these methods only use sensor input, and their performance is still limited in complex environments such as occlusion and extreme weather.
(2)Long-distance map perception
In order to make the results generated by online maps better used by downstream modules, some research attempts to further expand the scope of map perception . SuperFusion[7] achieves forward 90m long-distance prediction by fusing lidar and cameras and using depth-aware BEV transformation. NeuralMapPrior[8] enhances the quality of current online observations and expands the scope of perception by maintaining and updating global neural map priors. [6] obtains BEV features by aggregating satellite images and vehicle sensor data, and further predicts them. MV-Map focuses on offline, long-distance map generation. This method optimizes BEV features by aggregating all associated frame features and using neural radiation fields.
Overview of P-MapNet
The overall framework is shown in Figure 2.
Figure 2
Input: The system input is point cloud: , surround camera:, among which is the number of surround cameras. Common HDMap generation tasks (such as HDMapNet) can be defined as:
where represents feature extraction, represents segmentation head, is HDMap forecast result.
The P-MapNet we proposed combines SD Map and HD Map priors. This new task ( setting) can be expressed as:
where, represents SDMap prior, represents the refinement module mentioned in this article. The module learns the HD Map distribution prior through pre-training. Similarly, when only using SDMap prior, you get -only setting:
Output: For map generation tasks, there are usually two map representations: Rasterization and vectorization. In the research of this article, since the two a priori modules designed in this article are more suitable for rasterized output, we mainly focus on rasterized representation.
3.1 SDMap Prior module
SDMap data generation
This article is based on nuScenes and Argoverse2 data sets for research, using OpenStreetMap data The SD Map data of the corresponding area of the above data set is generated, and the coordinate system is transformed through the vehicle GPS to obtain the SD Map of the corresponding area.
BEV Query
As shown in Figure 2, we first perform feature extraction and perspective conversion on the image data and feature extraction on the point cloud to obtain BEV features. Then the BEV features are downsampled through the convolutional network to obtain the new BEV features:, and the feature map is flattened to obtain the BEV Query.
SD Map prior fusion
For SD Map data, after feature extraction through the convolutional network, the obtained features are compared with BEV Query Cross-attention mechanism:
The BEV features obtained after the cross-attention mechanism can obtain the initial prediction of map elements through the segmentation head.
3.2. HDMap Prior module
directly uses the rasterized HD Map as the input of the original MAE, and the MAE will be trained through MSE Loss, resulting in the inability to use refinement module. So in this article, we replace the output of MAE with our segmentation head. In order to make the predicted map elements have continuity and authenticity (closer to the distribution of the actual HD Map), we use a pre-trained MAE module for refinement. Training this module consists of two steps: the first step is to use self-supervised learning to train the MAE module to learn the distribution of HD Map, and the second step is to fine-tune all modules of the network by using the weights obtained in the first step as initial weights.
In the first step of pre-training, the real HD Map obtained from the data set is passed through a random mask and used as network input , and the training goal is to complete the HD Map:
In the second step of fine-tune, use the pre-trained weights of the first step as the initial weights. The complete network is:
4. Experiment
4.1 Datasets and indicators
We conducted evaluation on two mainstream data sets :nuScenes and Argoverse2. In order to prove the effectiveness of our proposed method at long distances, we set three different detection distances:, , . Among them, the resolution of BEV Grid in the range is 0.15m, and the resolution in the other two ranges is 0.3m. We use the mIOU metric to evaluate rasterized prediction results and mAP to evaluate vectorized prediction results. In order to evaluate the authenticity of the map, we also use the LPIPS metric as the map awareness metric.
4.2 Results
Comparison with SOTA results: We compare the proposed method with the current SOTA method in short distance (60m × 30m) and long distance (90m × 30m) ) to compare the map generation results. As shown in Table II, our method shows superior performance compared to existing vision-only and multi-modal (RGB LiDAR) methods.
We performed a performance comparison with HDMapNet [14] at different distances and using different sensor modes, and the results are summarized in Table I and Table III. Our method achieves 13.4% improvement on mIOU in the range of 240m × 60m. As the perceived distance exceeds or even exceeds the sensor detection range, the effectiveness of the SDMap prior becomes more significant, thus validating the efficacy of the SDMap prior. Finally, we leverage the HD map prior to further bring performance improvements by refining the initial prediction results to make them more realistic and eliminate false results.
HDMap a priori perceptual metric. The HDMap prior module maps the network’s initial predictions onto the HD map’s distribution, making it more realistic. In order to evaluate the authenticity of the HDMap prior module output, we used the perceptual metric LPIPS (the lower the value, the better the performance) for evaluation. As shown in Table IV, the LPIPS indicator in the setting has a greater improvement than that in the -only setting.
Visualization:
The above is the detailed content of Mass production killer! P-Mapnet: Using the low-precision map SDMap prior, the mapping performance is violently improved by nearly 20 points!. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Written above & the author’s personal understanding Three-dimensional Gaussiansplatting (3DGS) is a transformative technology that has emerged in the fields of explicit radiation fields and computer graphics in recent years. This innovative method is characterized by the use of millions of 3D Gaussians, which is very different from the neural radiation field (NeRF) method, which mainly uses an implicit coordinate-based model to map spatial coordinates to pixel values. With its explicit scene representation and differentiable rendering algorithms, 3DGS not only guarantees real-time rendering capabilities, but also introduces an unprecedented level of control and scene editing. This positions 3DGS as a potential game-changer for next-generation 3D reconstruction and representation. To this end, we provide a systematic overview of the latest developments and concerns in the field of 3DGS for the first time.

Yesterday during the interview, I was asked whether I had done any long-tail related questions, so I thought I would give a brief summary. The long-tail problem of autonomous driving refers to edge cases in autonomous vehicles, that is, possible scenarios with a low probability of occurrence. The perceived long-tail problem is one of the main reasons currently limiting the operational design domain of single-vehicle intelligent autonomous vehicles. The underlying architecture and most technical issues of autonomous driving have been solved, and the remaining 5% of long-tail problems have gradually become the key to restricting the development of autonomous driving. These problems include a variety of fragmented scenarios, extreme situations, and unpredictable human behavior. The "long tail" of edge scenarios in autonomous driving refers to edge cases in autonomous vehicles (AVs). Edge cases are possible scenarios with a low probability of occurrence. these rare events

0.Written in front&& Personal understanding that autonomous driving systems rely on advanced perception, decision-making and control technologies, by using various sensors (such as cameras, lidar, radar, etc.) to perceive the surrounding environment, and using algorithms and models for real-time analysis and decision-making. This enables vehicles to recognize road signs, detect and track other vehicles, predict pedestrian behavior, etc., thereby safely operating and adapting to complex traffic environments. This technology is currently attracting widespread attention and is considered an important development area in the future of transportation. one. But what makes autonomous driving difficult is figuring out how to make the car understand what's going on around it. This requires that the three-dimensional object detection algorithm in the autonomous driving system can accurately perceive and describe objects in the surrounding environment, including their locations,

The first pilot and key article mainly introduces several commonly used coordinate systems in autonomous driving technology, and how to complete the correlation and conversion between them, and finally build a unified environment model. The focus here is to understand the conversion from vehicle to camera rigid body (external parameters), camera to image conversion (internal parameters), and image to pixel unit conversion. The conversion from 3D to 2D will have corresponding distortion, translation, etc. Key points: The vehicle coordinate system and the camera body coordinate system need to be rewritten: the plane coordinate system and the pixel coordinate system. Difficulty: image distortion must be considered. Both de-distortion and distortion addition are compensated on the image plane. 2. Introduction There are four vision systems in total. Coordinate system: pixel plane coordinate system (u, v), image coordinate system (x, y), camera coordinate system () and world coordinate system (). There is a relationship between each coordinate system,

Trajectory prediction plays an important role in autonomous driving. Autonomous driving trajectory prediction refers to predicting the future driving trajectory of the vehicle by analyzing various data during the vehicle's driving process. As the core module of autonomous driving, the quality of trajectory prediction is crucial to downstream planning control. The trajectory prediction task has a rich technology stack and requires familiarity with autonomous driving dynamic/static perception, high-precision maps, lane lines, neural network architecture (CNN&GNN&Transformer) skills, etc. It is very difficult to get started! Many fans hope to get started with trajectory prediction as soon as possible and avoid pitfalls. Today I will take stock of some common problems and introductory learning methods for trajectory prediction! Introductory related knowledge 1. Are the preview papers in order? A: Look at the survey first, p

In the past month, due to some well-known reasons, I have had very intensive exchanges with various teachers and classmates in the industry. An inevitable topic in the exchange is naturally end-to-end and the popular Tesla FSDV12. I would like to take this opportunity to sort out some of my thoughts and opinions at this moment for your reference and discussion. How to define an end-to-end autonomous driving system, and what problems should be expected to be solved end-to-end? According to the most traditional definition, an end-to-end system refers to a system that inputs raw information from sensors and directly outputs variables of concern to the task. For example, in image recognition, CNN can be called end-to-end compared to the traditional feature extractor + classifier method. In autonomous driving tasks, input data from various sensors (camera/LiDAR

Original title: SIMPL: ASimpleandEfficientMulti-agentMotionPredictionBaselineforAutonomousDriving Paper link: https://arxiv.org/pdf/2402.02519.pdf Code link: https://github.com/HKUST-Aerial-Robotics/SIMPL Author unit: Hong Kong University of Science and Technology DJI Paper idea: This paper proposes a simple and efficient motion prediction baseline (SIMPL) for autonomous vehicles. Compared with traditional agent-cent

Written in front & starting point The end-to-end paradigm uses a unified framework to achieve multi-tasking in autonomous driving systems. Despite the simplicity and clarity of this paradigm, the performance of end-to-end autonomous driving methods on subtasks still lags far behind single-task methods. At the same time, the dense bird's-eye view (BEV) features widely used in previous end-to-end methods make it difficult to scale to more modalities or tasks. A sparse search-centric end-to-end autonomous driving paradigm (SparseAD) is proposed here, in which sparse search fully represents the entire driving scenario, including space, time, and tasks, without any dense BEV representation. Specifically, a unified sparse architecture is designed for task awareness including detection, tracking, and online mapping. In addition, heavy
