I am very happy to be invited to participate in the Heart of Autonomous Driving event. We will share the resistance to online reconstruction of vectorized high-precision maps. Perturbation method ADMap. You can find our code at https://github.com/hht1996ok/ADMap. Thank you all for your attention and support.
In the field of autonomous driving, online high-definition map reconstruction is of great significance for planning and prediction tasks. Recent work has built many high-performance high-definition map reconstruction models to meet this need. However, the point order within the vectorized instance may be jittered or jagged due to prediction bias, which will affect subsequent tasks. Therefore, we propose the Anti-Disturbance Map reconstruction framework (ADMap). This article hopes to take into account model speed and overall accuracy, and not bother engineers when deploying. Therefore, three efficient and effective modules are proposed: Multi-Scale Perception Neck (MPN), Instance Interactive Attention (IIA), and Vector Direction Difference Loss (VDDL). By cascading to explore point order relationships between and within instances, our model better supervises the point order prediction process.
We verified the effectiveness of ADMap in nuScenes and Argoverse2 datasets. Experimental results show that ADMap exhibits the best performance in various benchmark tests. In the nuScenes benchmark, ADMap improves mAP by 4.2% and 5.5% compared to the baseline using only camera data and multi-modal data, respectively. ADMapv2 not only reduces inference latency, but also significantly improves baseline performance, with the highest mAP reaching 82.8%. In the Argoverse dataset, the mAP of ADMapv2 increased to 62.9% while the frame rate remained at 14.8FPS.
In summary, the ADMap we proposed has the following main contributions:
As shown in Figure 1, the prediction points in the example often inevitably have jitter or offset. This jitter will As a result, the reconstructed instance vector becomes unsmooth or jagged, seriously affecting the quality and practicality of online high-precision maps. We believe that the reason is that existing models do not fully consider the interaction between instances and within instances. Incomplete interaction between instance points and map topological information will lead to inaccurate predicted positions. In addition, only supervision such as L1 loss and cosine embedding loss cannot effectively use geometric relationships to constrain the prediction process of instance points. The network needs to use vector line segments between points to finely capture the direction information of the point sequence to more accurately constrain each point. prediction process.
In order to alleviate the above problems, we innovatively proposed the Anti-Disturbance Map reconstruction framework (ADMap) to achieve real-time and stable reconstruction of vectorized high-precision maps.
As shown in Figure 2, ADMap uses multi-scale perception neck (Multi-Scale Perception Neck, MPN), instance interactive attention (Instance Interactive Attention) , IIA) and Vector Direction Difference Loss (VDDL) to predict the point order topology more precisely. MPN, IIA and VDDL will be introduced respectively below.
In order to obtain more detailed BEV features, we introduce Multi-Scale Perception Neck (MPN ). MPN receives the fused BEV features as input. Through downsampling, the BEV features of each level are connected to an upsampling layer to restore the original size feature map. Finally, the feature maps at each level will be merged into multi-scale BEV features.
The dotted line in Figure 2 means that this step is only implemented during training, and the solid line means that this step will be implemented during both training and inference processes. During the training process, multi-scale BEV feature maps and BEV feature maps at each level are sent to the Transformer Decoder, which allows the network to predict instance information of the scene at different scales to capture more refined multi-scale features. During the inference process, MPN only retains multi-scale BEV features and does not output feature maps at each level. This ensures that the resource usage of the neck during inference remains unchanged.
Transformer Decoder defines a set of instance-level queries and a set of point-level queries, and then shares the point-level queries to all instances. These hierarchical queries are defined as:
The decoder consists of several cascaded decoding layers that iteratively update the hierarchical query. In each decoding layer, hierarchical queries are input into the self-attention mechanism, which allows information to be exchanged between hierarchical queries. Deformable Attention is used to interact with hierarchical queries and multi-scale BEV features.
In order to better obtain the characteristics of each instance in the decoding stage, we proposed Instance Interactive Attention (IIA), which consists of Instances self-attention and Points self-attention composition. Unlike MapTRv2 which extracts instance-level and point-level embeddings in parallel, IIA extracts query embeddings cascaded. Feature interactions between instance embeddings further help the network learn relationships between point-level embeddings.
As shown in Figure 3, the hierarchical embeddings output by Deformable cross-attention are input to Instances self-attention. After merging the point dimension and the channel dimension, the dimension transformation is. Subsequently, the hierarchical embedding is connected to the Embed Layer composed of multiple MLPs to obtain the instance query. The query is put into Multi-head self-attention to capture the topological relationship between instances and obtain the instance embedding. To incorporate instance-level information into point-level embeddings, we sum instance embeddings and hierarchical embeddings. The added features are input into Point self-attention, which interacts with the point features within each instance to further finely correlate the topological relationships between point sequences.
The high-definition map contains vectorized static map elements, including lane lines, curbs, and crosswalks. ADMap proposes Vector Direction Difference Loss for these open shapes (lane lines, curbs) and closed shapes (crosswalks). We model the point sequence vector direction inside the instance, and the direction of the point can be supervised in more detail by the difference between the predicted vector direction and the true vector direction. In addition, points with large differences in real vector directions are considered to represent drastic changes in the topology of some scenes (more difficult to predict), and require more attention from the model. Therefore, points with larger true vector direction differences are given greater weight to ensure that the network can accurately predict this drastic change point.
Figure 4 shows the predicted point sequence { and the real point sequence { ) for the predicted vector line { and the initial modeling of true vector lines { . To ensure that opposite angles don't get the same loss, we calculate the cosine of the vector line angle difference θ':
where the function accumulates the vector lines The coordinate position represents the normalization operation. We use the vector angle difference of each point in the real instance to assign weights of different sizes to them. The weight is defined as follows:
# which represents the number of points in the instance, and the function represents the exponential function with the base e. Since the vector angle difference cannot be calculated between the first and last points, we set the weight of the first and last points to 1. When the vector angle difference in the ground truth becomes larger, we give that point a greater weight, which makes the network pay more attention to significantly changing map topology. The angle difference loss of each point in the point sequence is defined as:
We use θ to adjust the interval of the loss value to [0.0, 2.0]. By adding the cosines of the angle differences between adjacent vector lines at each point, this loss more comprehensively covers the geometric topology information of each point. Since the first and last two points have only one adjacent vector line, the loss of the first and last two points is the cosine of the single vector angle difference.
For fair evaluation, we divide the map elements into three types: lane lines, road boundaries and crosswalks. The average accuracy (AP) is used to evaluate the quality of map construction, and the sum of the chamfer distances between the predicted point order and the true point order is used to determine whether the two match. The Chamfer distance threshold is set to [0.5, 1.0, 1.5], we calculate AP under these three thresholds respectively, and use the average as the final indicator.
Table 1 reports the metrics of ADMap and state-of-the-art methods on the nuScenes dataset. Under the camera-only framework, ADMap's mAP increased by 5.5% compared to baseline (MapTR), and ADMapv2 increased by 1.4% compared to baseline (MapTRv2). ADMapv2 has a maximum mAP of 82.8%, achieving the best performance among current benchmarks. Some details will be announced in subsequent arxiv versions. In terms of speed, ADMap significantly improves model performance compared to its baseline at a slightly lower FPS. It is worth mentioning that ADMapv2 not only improves performance, but also improves model inference speed.
Table 2 reports the metrics of ADMap and state-of-the-art methods in Argoverse2. Under the camera-only framework, ADMap and ADMapv2 improved by 3.4% and 1.3% respectively compared to the baseline. Under the multi-modal framework, ADMap and ADMapv2 achieved the best performance, with mAP of 75.2% and 76.9% respectively. In terms of speed. ADMapv2 improved by 11.4ms compared to MapTRv2.
In Table 3, we provide ablation experiments for each module of ADMap on the nuScenes benchmark.
#Table 4 shows the impact of inserting different attention mechanisms on the final performance. DSA stands for decoupled self-attention, and IIA stands for instance interactive attention. The results show that IIA improves mAP by 1.3% compared to DSA.
Table 5 reports the impact of adding backbone and neck layers on mAP after merging features. After adding backbone and neck layers based on SECOND, mAP increased by 1.2%. After adding MPN, the mAP of the model increased by 2.0% without increasing the inference time.
Table 6 reports the performance impact of adding VDDL to the nuScenes benchmark. It can be seen that when the weight is set to 1.0, mAP is the highest, reaching 53.3%.
#Table 7 reports the impact of the number of MPN downsampling layers on the final performance in the nuScenes benchmark. The more downsampling layers, the slower the model inference speed. Therefore, to balance speed and performance, we set the number of downsampling layers to 2.
In order to verify that ADMap effectively alleviates the point order disturbance problem, we proposed the average chamfer distance (ACE). We picked predicted instances whose sum of chamfer distances is less than 1.5 and calculated their average chamfer distance (ACE). The smaller the ACE is, the more accurate the instance point order prediction is. Table 8 proves that ADMap can effectively alleviate the problem of point cloud disturbance.
The following two pictures are the visualization results of the nuScenes data set and the Argoverse2 data set.
ADMap is an efficient and effective vectorized high-precision map reconstruction framework, which effectively alleviates the jitter or aliasing phenomenon that may occur in the point order of instance vectors due to prediction bias. Extensive experiments show that our proposed method achieves the best performance on both nuScenes and Argoverse2 benchmarks. We believe that ADMap will help advance research on vector high-precision map reconstruction tasks, thereby better promoting the development of autonomous driving and other fields.
The above is the detailed content of ADMap: A new idea for anti-interference online high-precision maps. For more information, please follow other related articles on the PHP Chinese website!