Table of Contents
Written before&The author’s personal understanding
Review of related work
MapTracker
Memory Buffers
BEV Module
VEC Module
Training
Consistent Vector HD Mapping Benchmarks
Consistent ground truth
Consistency-aware mAP metric
Experiments
Results with geographically non-overlapping data
Home Technology peripherals AI Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

Apr 25, 2024 pm 05:01 PM
git sota online map

Written before&The author’s personal understanding

This algorithm allows online high-precision map construction. Our method, MapTracker, accumulates sensor streams into memory buffers of two displays: 1) Raster latents in Bird’s Eye View (BEV) space and 2) Vector latents on road elements (i.e., crosswalks, lane lines, and road boundaries). The method draws on the query propagation paradigm in target tracking, which explicitly associates the tracked road elements of the previous frame with the current frame, while fusing a subset of memory latents with distance strides to

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

Open source link: https://map-tracker.github.io/

In summary, the main contributions of this article are as follows:

  • A new vector HD mapping algorithm that formulates HD mapping as a tracking task and exploits the history of memory latents in both representations to achieve temporal consistency;
  • An improved vector HD mapping benchmark , with time-consistent GT and consistency-aware mAP metric;
  • SOTA performance! Significant improvements over the current best methods on traditional and new metrics.

This paper considers and solves the problem of consistent vector HD mapping in two ways. We first review the latest trends in visual object tracking using Transformer and memory designs in vision-based autonomous driving. Finally, we discuss competing vector HD mapping methods.

Using transformers for visual target tracking. Visual object tracking has a long history, among which end-to-end transformer methods have become a recent trend due to their simplicity. TrackFormer, TransTrack, and MOTR leverage attention mechanisms and tracking queries to explicitly correlate instances across frames. MeMOT and MeMOTR further extend the tracking transformer with a memory mechanism for better long-term consistency. This paper formulates vector HD mapping as a tracking task by combining tracking queries with a more robust memory mechanism.

Memory design in autonomous driving. Single-frame autonomous driving systems have difficulty handling occlusions, sensor failures, or complex environments. Temporal modeling with Memory provides a promising addition. Many memory designs exist for grating BEV functions that form the basis of most autonomous driving tasks. BEVDet4D and BEVFormerv2 superimpose the features of multiple past frames into Memory, but the calculation expands linearly with the history length, making it difficult to capture long-term information. VideoBEV propagates BEV raster queries across frames to accumulate information in a loop. In the vector domain, Sparse4Dv2 uses similar RNN-style memory for target queries, while Sparse4Dv3 further uses temporal denoising for robust temporal learning. These ideas have been partially incorporated into vector HD mapping methods. This paper proposes a new memory design for the grating BEV latency and vector latency of road elements.

Vector HD mapping. Traditionally, high-precision maps are reconstructed offline using SLAM-based methods and then manually managed, which requires high maintenance costs. With the improvement of accuracy and efficiency, online vector high-precision map algorithms have attracted more attention than offline map algorithms, which will simplify the production process and handle map changes. HDMapNet converts raster image segmentation into vector image instances through post-processing and established the first vector HD mapping benchmark. Both VectorMapNet and MapTR utilize DETR-based transformers for end-to-end prediction. The former autoregressively predicts the vertices of each detected curve, while the latter uses hierarchical query and matching loss to predict all vertices simultaneously. MapTRv2 further complements MapTR with auxiliary tasks and network modifications. Curve representation, network design, and training paradigms are the focus of other work. StreamMapNet takes a step towards consistent mapping by drawing on the flow idea in BEV perception. The idea is to accumulate past information into memory latents and pass them as conditions (i.e. condition detection framework). SQD MapNet imitates DN-DETR and proposes temporal curve denoising to promote temporal learning.

MapTracker

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

The robust memory mechanism is the core of MapTracker, which accumulates sensor streams into latent memories represented in two ways: 1) Bird's-eye view (BEV) memory of the area around the vehicle in a top-down BEV coordinate system as the latent image; and 2) Vector (VEC) memory of road elements (i.e., pedestrian intersections, lane lines, and road boundaries) as a set of potential quantities.

Two simple ideas and the memory mechanism achieve consistent mapping. The first idea is to use a historical memory buffer instead of a single memory for the current frame. A single memory should hold information for the entire history, but it is easy to lose memory, especially in cluttered environments with a large number of vehicles blocking road structures. Specifically, for efficiency and coverage, we select a subset of past latent memories for fusion at each frame based on vehicle motion. The second idea is to formulate online HD maps as tracking tasks. The VEC memory mechanism maintains the sequence of memory latents for each road element and makes this formulation simple by borrowing the query propagation paradigm from the tracking literature. The remainder of this section explains our neural architecture (see Figures 2 and 3), including BEV and VEC memory buffers and their corresponding network modules, and then introduces the training details.

Memory Buffers

BEV memory is a 2D latent in the BEV coordinate system, centered on the vehicle and oriented at the tth frame. The spatial dimension (i.e. 50×100) covers a rectangular area, 15m left/right and 30m front/back. Each memory latency accumulates the entire past information, and the buffer maintains such memory latents in the last 20 frames, making the memory mechanism redundant but robust.

VECmemory is a set of vector latency times. Each vector latency time accumulates information about active road elements until frame t. The number of active elements changes from frame to frame. The buffer holds the latent vectors of the past 20 frames and their correspondence between frames (i.e., the latent sequence of vectors corresponding to the same road element).

BEV Module

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

The input is 1) CNN features of the airborne surround image processed by the image backbone and its camera parameters; 2) BEV memory buffer and 3) vehicle motion. The following explains the four components of the BEV module architecture and their outputs.

  • BEV Query Propagation: BEV memory is the 2D latent image in the vehicle coordinate system. Affine transformation and bilinear interpolation initialize the current BEV memory to the previous BEV memory. For pixels located outside the latent image after transformation, the learnable embedding vector of each pixel is initialized, and its operation is represented as "MaskBlend" in Figure 3.
  • Deformable Self-Attention: The deformable self-attention layer enriches BEV memory.
  • Perspective-to-BEV Cross-Attention: Similar to StreamMapNet, BEVFormer’s spatially deformable cross-attention layer injects perspective information into MBEV(t).
  • BEV Memory Fusion: The memory latents in the buffer are fused to enrich MBEV(t). Using all memories is computationally expensive and redundant.

The output is 1) the final memory MBEV(t) saved to the buffer and passed to the VEC module; and 2) the rasterized road element geometry S(t) inferred by the segmentation head and used for loss calculations ). The segmentation head is a linear projection module that projects each pixel in the memory latent to a 2×2 segmentation mask, resulting in a 100×200 mask.

VEC Module

The input is BEV memory MBEV(t) and vector memory buffer and vehicle motion;

  • Vector Query Propagation: vector memory is a set of potential vectors of active road elements.
  • Vector Instance Self Attention: Standard self-attention layer;
  • BEV-to-Vector Cross Attention: Multi-Point Attention;
  • Vector Memory Fusion: For the current memory For each latent vector in MVEC(t), the latent vectors in the buffer associated with the same road element are fused to enrich its representation. The same stride frame selection selects four potential vectors, where for some road elements with short tracking history, the selected frames π(t) will be different and less. For example, an element tracked for two frames has only two latents in the buffer.

The output is 1) the final memory of "positive" road elements that passed the classification test of a single fully connected layer from MVEC(t); and 2) the final memory of the "positive" road elements passed by the classification test from MVEC(t); 3-layer MLP regression of positive road elements on vector road geometry.

Training

BEV loss:

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

VEC loss. Inspired by MOTR, an end-to-end transformer for multi-object tracking, we extend the matching-based loss to explicitly consider GT tracking. The optimal instance-level label assignment for a new element is defined as:

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

Then the label assignment ω(t) between all outputs and GT is defined inductively:

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

The tracking style loss of vector output is:

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

Conversion loss. We borrow the transformation loss Ltrans from StreamMapNet to train PropMLP, which forces query transformations in the latent space to preserve vector geometry and class types. The final training loss is:

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

Consistent Vector HD Mapping Benchmarks

Consistent ground truth

MapTR created a vector HD mapping benchmark from the nuScenes and Agroverse2 datasets, which was adopted by many subsequent studies. However, crosswalks are naively merged together and inconsistent across frames. The dividing line is also inconsistent with the failure of its graph tracking process (for Argoverse2).

StreamMapNet inherits the code of VectorMapNet and creates a benchmark with better realism that has been used in the workshop challenge. However, some problems remain. For Argoverse2, dividers are sometimes split into shorter segments. For nuScenes, large crosswalks sometimes segment out small loops, whose inconsistencies appear randomly in each frame, resulting in temporarily inconsistent representations. We provide visualizations of existing benchmark problems in the appendix.

We improved the processing code of the existing baseline to (1) enhance the GT geometry of each frame and then (2) calculate their correspondence between frames to form a GT "trajectory".

(1) Enhance each frame geometry. We inherited and improved the MapTR codebase popular in the community while making two changes: replacing walking area processing with processing in StreamMapNet and improving quality with more geometric constraints; and enhancing the graph tracking algorithm to handle Noise from the original annotations to enhance temporal consistency in divider processing (Argoverse2 only).

(2) Forming tracks. Given the geometric structure of road elements in each frame, we solve the optimal bipartite matching problem between each pair of adjacent frames to establish the correspondence between road elements. Pairs of correspondences are linked to form trajectories of road elements. The matching score between a pair of road elements is defined as follows. Road element geometry is a polygonal curve or loop. We convert the element geometry from the old frame to the new frame based on vehicle motion and then rasterize two curves/loops with a certain thickness into instance masks. Their intersection on the union is the matching score.

Consistency-aware mAP metric

The mAP metric does not penalize temporarily inconsistent reconstructions. We match reconstructed road elements and ground truth in each frame independently with chamfer distances, as in the standard mAP procedure, and then eliminate temporarily inconsistent matches through the following checks. First, for the baseline method that does not predict tracking information, we use the same algorithm used to obtain GT temporal correspondence to form trajectories of reconstructed road elements (we also extend the algorithm to re-identify missing elements by trading off speed; see for details appendix). Next, let the "ancestors" be the road elements that belong to the same trajectory in the previous frame. From the beginning of the sequence, we remove every frame match (reconstructed element and ground truth element) as temporarily inconsistent if any of their ancestors do not match. The remaining temporally consistent matches are then used to calculate standard mAP.

Experiments

We built our system based on the StreamMapNet codebase while using 8 NVIDIA RTX A5000 GPUs for 72 epochs on nuScenes and 35 on Argoverse2 epoch to train our model. The batch sizes for the three training stages are 16, 48 and 16 respectively. Training takes about three days, and inference speed is about 10 FPS. After explaining the dataset, metrics, and baseline methods, this section provides experimental results.

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

One of our contributions is to achieve temporal consistency on two existing counterparts, namely MapTR and StreamMapNet Ground Truth (GT). Tables 1 and 2 show the results of training and testing the system on one of the three GTs (shown in the first column). Since our codebase is based on StreamMapNet, we evaluate our system on StreamMapNet GT and our ad hoc consistent GT.

nuScenes results. Table 1 shows that both MapTRv2 and StreamMapNet achieve better mAP using our GT, which is what we would expect when fixing the inconsistency in their original GT. StreamMapNet's improvement is slightly higher because it has temporal modeling (while MapTR does not) and exploits temporal consistency in the data. MapTracker significantly outperforms competing methods, especially as our consistent GT improves by more than 8% and 22% in raw and consistency-aware mAP scores, respectively. Note that MapTracker is the only system that produces explicit tracking information (i.e., reconstructing the correspondence of elements between frames), which is required for consistency region mAP. A simple matching algorithm creates trajectories for the baseline method.

Argoverse2 results. Table 2 shows that both MapTRv2 and StreamMapNet achieve better mAP scores with our consistent GT, which in addition to being temporally consistent, also has higher quality GT (for crosswalks and dividers), benefiting all methods. MapTracker outperforms all other baselines in all settings by a significant margin (i.e., 11% or 8%, respectively). The Consistency Awareness Score (C-mAP) further demonstrates our superior consistency, improving over 18% over StreamMapNet.

Results with geographically non-overlapping data

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

##The official train/test split of nuScenes and Agroverse2 datasets have geographical overlap (i.e. , the same roads appear in training/testing), which allows for overfitting. Table 3 compares the best baseline methods proposed by StreamMapNet and MapTracker based on geographically non-overlapping segmentation. MapTracker consistently performs well by a significant margin, demonstrating strong cross-scenario generalization capabilities. Note that performance on nuScenes datasets will be reduced for both methods. After careful inspection, the detection of road elements is successful, but the coordinate errors of the regression are large, resulting in poor performance. The appendix provides additional analysis.

Ablation studies

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

The ablation studies in Table 4 demonstrate the contribution of key design elements in MapTracker. The first "baseline" entry is StreamMapNet, which has no temporal inference capabilities (i.e. no BEV and vector flow memory and modules). The second entry is StreamMapNet. Both methods were trained for 110 epochs until complete convergence. The last three entries are variations of MapTracker, with or without key design elements. The first variant discards the memory fusion component in the BEV/VEC module. This variant utilizes tracking formulas but relies on a single BEV/VEC memory to hold past information. The second variant adds a memory buffer and memory fusion component, but without stride, i.e. using the latest 4 frames for fusion. This variant improves performance and demonstrates the effectiveness of our memory mechanism. The last variant adds memory strides, making more efficient use of the memory mechanism and improving performance.

Qualitative evaluations

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!

Figure 4 shows a qualitative comparison of MapTracker and baseline methods on nuScenes and Argoverse2 datasets. For better visualization, we use a simple algorithm to merge each frame vector HD map into a global vector HD map. See the appendix for details on the merging algorithm and visualization of per-frame reconstructions. MapTracker produced more accurate and cleaner results, showing superior overall quality and temporal consistency. For scenarios where the vehicle is turning or not moving slightly forward (including the two examples in Figure 1), StreamMapNet and MapTRv2 can produce unstable results, resulting in broken and noisy merged results. This is mainly because detection-based formulations struggle to maintain temporally coherent reconstruction under complex vehicle motions.

Conclusion

This paper introduces MapTracker, which formulates online HD mapping as a tracking task and leverages the history of raster and vector latents to maintain temporal consistency. We use a query propagation mechanism to associate tracked road elements across frames, and fuse a selected subset of memory entries with distance strides to enhance consistency. We also improve existing baselines by generating consistent GTs using tracking labels and enhancing the raw mAP metric with timing consistency checks. MapTracker significantly outperforms existing methods on nuScenes and Agroverse2 datasets when evaluated using traditional metrics, and it demonstrates superior temporal consistency when evaluated using our consistency-aware metrics.

Limitations: We identified two limitations of MapTracker. First, the current tracking formulation does not handle the merging and splitting of road elements (e.g., a U-shaped boundary is split into two straight lines in future frames, and vice versa). Nor do basic facts represent them appropriately. Secondly, our system is still at 10 FPS and real-time performance is a bit lacking, especially during critical crash events. Optimizing efficiency and handling more complex real-world road structures are our future work.

The above is the detailed content of Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps!. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to install deepseek How to install deepseek Feb 19, 2025 pm 05:48 PM

There are many ways to install DeepSeek, including: compile from source (for experienced developers) using precompiled packages (for Windows users) using Docker containers (for most convenient, no need to worry about compatibility) No matter which method you choose, Please read the official documents carefully and prepare them fully to avoid unnecessary trouble.

Summary of FAQs for DeepSeek usage Summary of FAQs for DeepSeek usage Feb 19, 2025 pm 03:45 PM

DeepSeekAI Tool User Guide and FAQ DeepSeek is a powerful AI intelligent tool. This article will answer some common usage questions to help you get started quickly. FAQ: The difference between different access methods: There is no difference in function between web version, App version and API calls, and App is just a wrapper for web version. The local deployment uses a distillation model, which is slightly inferior to the full version of DeepSeek-R1, but the 32-bit model theoretically has 90% full version capability. What is a tavern? SillyTavern is a front-end interface that requires calling the AI ​​model through API or Ollama. What is breaking limit

What are the AI ​​tools? What are the AI ​​tools? Nov 29, 2024 am 11:11 AM

AI tools include: Doubao, ChatGPT, Gemini, BlenderBot, etc.

What are the Grayscale Encryption Trust Funds? Common Grayscale Encryption Trust Funds Inventory What are the Grayscale Encryption Trust Funds? Common Grayscale Encryption Trust Funds Inventory Mar 05, 2025 pm 12:33 PM

Grayscale Investment: The channel for institutional investors to enter the cryptocurrency market. Grayscale Investment Company provides digital currency investment services to institutions and investors. It allows investors to indirectly participate in cryptocurrency investment through the form of trust funds. The company has launched several crypto trusts, which has attracted widespread market attention, but the impact of these funds on token prices varies significantly. This article will introduce in detail some of Grayscale's major crypto trust funds. Grayscale Major Crypto Trust Funds Available at a glance Grayscale Investment (founded by DigitalCurrencyGroup in 2013) manages a variety of crypto asset trust funds, providing institutional investors and high-net-worth individuals with compliant investment channels. Its main funds include: Zcash (ZEC), SOL,

Delphi Digital: How to change the new AI economy by parsing the new ElizaOS v2 architecture? Delphi Digital: How to change the new AI economy by parsing the new ElizaOS v2 architecture? Mar 04, 2025 pm 07:00 PM

ElizaOSv2: Empowering AI and leading the new economy of Web3. AI is evolving from auxiliary tools to independent entities. ElizaOSv2 plays a key role in it, which gives AI the ability to manage funds and operate Web3 businesses. This article will dive into the key innovations of ElizaOSv2 and how it shapes an AI-driven future economy. AI Automation: Going to independently operate ElizaOS was originally an AI framework focusing on Web3 automation. v1 version allows AI to interact with smart contracts and blockchain data, while v2 version achieves significant performance improvements. Instead of just executing simple instructions, AI can independently manage workflows, operate business and develop financial strategies. Architecture upgrade: Enhanced A

As top market makers enter the crypto market, what impact will Castle Securities have on the industry? As top market makers enter the crypto market, what impact will Castle Securities have on the industry? Mar 04, 2025 pm 08:03 PM

The entry of top market maker Castle Securities into Bitcoin market maker is a symbol of the maturity of the Bitcoin market and a key step for traditional financial forces to compete for future asset pricing power. At the same time, for retail investors, it may mean the gradual weakening of their voice. On February 25, according to Bloomberg, Citadel Securities is seeking to become a liquidity provider for cryptocurrencies. The company aims to join the list of market makers on various exchanges, including exchanges operated by CoinbaseGlobal, BinanceHoldings and Crypto.com, people familiar with the matter said. Once approved by the exchange, the company initially planned to set up a market maker team outside the United States. This move is not only a sign

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models Mar 12, 2025 pm 01:03 PM

Researchers from Shanghai Jiaotong University, Shanghai AILab and the Chinese University of Hong Kong have launched the Visual-RFT (Visual Enhancement Fine Tuning) open source project, which requires only a small amount of data to significantly improve the performance of visual language big model (LVLM). Visual-RFT cleverly combines DeepSeek-R1's rule-based reinforcement learning approach with OpenAI's reinforcement fine-tuning (RFT) paradigm, successfully extending this approach from the text field to the visual field. By designing corresponding rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT overcomes the limitations of the DeepSeek-R1 method being limited to text, mathematical reasoning and other fields, providing a new way for LVLM training. Vis

Bitwise: Businesses Buy Bitcoin A Neglected Big Trend Bitwise: Businesses Buy Bitcoin A Neglected Big Trend Mar 05, 2025 pm 02:42 PM

Weekly Observation: Businesses Hoarding Bitcoin – A Brewing Change I often point out some overlooked market trends in weekly memos. MicroStrategy's move is a stark example. Many people may say, "MicroStrategy and MichaelSaylor are already well-known, what are you going to pay attention to?" This is true, but many investors regard it as a special case and ignore the deeper market forces behind it. This view is one-sided. In-depth research on the adoption of Bitcoin as a reserve asset in recent months shows that this is not an isolated case, but a major trend that is emerging. I predict that in the next 12-18 months, hundreds of companies will follow suit and buy large quantities of Bitcoin

See all articles