ICLR'24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness-AI-php.cn

Table of Contents

Written in front&The author’s personal understanding

Review of Related Work

Detailed explanation of LaneSegNet

Lane Segment Perception Task Description

LaneSegNet Framework

Experimental results

Main experimental structure

Ablation Experiment

Conclusion

Home

Technology peripherals

ICLR'24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jan 19, 2024 am 11:12 AM

network Model

Written in front&The author’s personal understanding

Maps are key information for downstream applications of autonomous driving systems, and are usually represented by lanes or center lines. However, the existing map learning literature mainly focuses on detecting geometry-based topological relationships of lanes or sensing centerlines. Both methods ignore the inherent relationship between lane lines and center lines, that is, lane lines bind center lines. Although simply predicting two types of lanes in one model are mutually exclusive in the learning goal, this paper proposes lane segmentation as a new representation that seamlessly combines geometric and topological information, thus proposing LaneSegNet. This is the first end-to-end mapping network that generates lane segments to obtain a complete representation of road structure. LaneSegNet has two key modifications. One is the lane attention module, which is used to capture key area details within long-distance feature space. The other is the same initialization strategy of the reference point, which enhances the learning of position priors for lane attention. On the OpenLane-V2 dataset, LaneSegNet has significant advantages over previous similar products in three tasks, namely map element detection (4.8 mAP), lane centerline perception (6.9 DETl) and newly defined lane segment perception (5.6 mAP). Additionally, it achieved a real-time inference speed of 14.7FPS.

Open source link: https://github.com/OpenDriveLab/LaneSegNet

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

In summary, the main contributions of this article are as follows:

This article introduces a new lane segment perception as a new map learning formula. It contains geometric and topological elements. We hope it will bring new insights to the field.
This article proposes LaneSegNet, an end-to-end network proposed for lane segment awareness. Two new modifications have been proposed, including a lane attention module with heads-to-regions mechanism to capture long-range attention, and the same initialization strategy for reference points to enhance the location prior of lane attention. study.

Centerline Perception: Centerline Perception from Vehicle-mounted Sensor Data (Compared to Lane Map Learning in this paper same) has attracted a great deal of attention recently. STSU proposed a DETR-like network to detect centerlines, followed by a multilayer perceptron (MLP) module to determine their connectivity. Based on STSU, Can et al. introduced an additional minimum loop query to ensure the correct order of overlapping rows. CenterLineDet treats center lines as vertices and designs a graph update model trained through imitation learning. It is worth noting that Tesla proposed the concept of "lane language" to express the lane map as a sentence. Their attention-based model recursively predicts lane markings and their connectivity. In addition to these segmentation methods, LaneGAP also introduces a path method that uses an additional transformation algorithm to recover the lane map. TopoNet targets complete and diverse driving scene graphs, explicitly models the connectivity of center lines within the network, and incorporates traffic elements into the task. In this work, we adopt the segment method to construct lane graphs. However, we differ from previous methods in modeling Lane Segments instead of taking the centerline as the vertex of the lane graph, which allows convenient integration of segment-level geometric and semantic information.

Map element detection: In previous work, people focused on improving map element detection from the camera plane to 3D space to overcome projection errors. With the popular trend of BEV sensing, recent works focus on learning HD maps using segmentation and vectorization methods. Map segmentation predicts the semantics of each pure BEV grid, such as lanes, crosswalks, and drivable areas. These works mainly differ in perspective view (PV) to BEV conversion modules. However, segmented maps cannot provide direct information used by downstream modules. HDMapNet handles this problem by grouping and vectorizing segmentation maps with complex post-processing.

Although dense segmentation provides pixel-level information, it still cannot touch the complex relationships of overlapping elements. VectorMapNet proposes to represent each map element directly as a sequence of points, using coarse keypoints to sequentially decode lane locations. MapTR explores a unified permutation-based point sequence modeling approach to eliminate modeling ambiguity and improve performance and efficiency. PivotNet further models map elements using pivot-based representation in an ensemble prediction framework to reduce redundancy and improve accuracy. StreamMapNet utilizes multi-point attention and temporal information to improve the stability of remote map element detection. In fact, since vectorization also enriches the direction information of lanes, vectorization-based methods can be easily adapted to centerline awareness through alternating supervision. In this work, we propose a unified, easy-to-learn representation—lane segmentation—for all HD map elements on a road.

Detailed explanation of LaneSegNet

Lane Segment Perception Task Description

Instances of Lane Segment contain geometric and semantic aspects of the road. As for geometry, it can be represented as a line segment consisting of a vectorized centerline and its corresponding lane boundary: . Each line is defined as an ordered collection of points in 3D space. Alternatively, the geometry can be described as a closed polygon that defines the drivable area within that lane.

In terms of semantics, it includes Lane Segment category C (e.g., Lane Segment, pedestrian crossing) and the line style of the left/right lane boundary (e.g., invisible, solid, dashed line): {}. These details provide autonomous vehicles with important insights into deceleration requirements and the feasibility of lane changes.

In addition, topology information plays a crucial role in path planning. To represent this information, a lane graph is constructed for Lane Segment, represented as G = (V, E). Each Lane Segment is a node in the graph, represented by the set V, and the edges in the set E describe the connectivity between Lane Segments. We use an adjacency matrix to store this lane graph, where matrix element (i, j) is set to 1 only when the j-th Lane Segment follows the i-th Lane Segment; otherwise, it remains 0.

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

LaneSegNet Framework

The overall framework of LaneSegNet is shown in Figure 2. LaneSegNet takes surround images as input to perceive Lane Segments within a specific BEV range. In this section, we first briefly introduce the LaneSeg encoder used to generate BEV features. Then, we introduce lane segmentation decoder and lane attention. Finally, we propose lane segmentation predictors along with training losses.

LaneSeg Encoder

The encoder converts the surround image into BEV features for Lane Segment extraction. We utilize the standard ResNet-50 backbone to derive feature maps from raw images. The PV to BEV encoder module using BEVFormer is then used for view conversion.

LaneSeg Decoder

Transformer-based detection method utilizes the decoder to collect features from BEV features and updates the decoder query through multiple layers. Each decoder layer utilizes self-attention, cross-attention mechanisms, and feed-forward networks to update the query. Additionally, learnable location queries are employed. The updated query is then output and fed to the next stage.

Due to complex and elongated map geometries, collecting long-range BEV features is crucial for online mapping tasks. Previous work utilizes hierarchical (instance point) decoder queries and deformable attention to extract local features for each point query. Although this approach avoids capturing long-distance information, it comes with high computational cost due to the increased number of queries.

Lane Segment, as a lane instance representation for constructing scene graphs, has superior characteristics at the instance level. Our goal is not to use multi-point queries, but to use single instance queries to represent Lane Segments. Therefore, the core challenge is how to use single instance queries to cross-focus on global BEV features.

Lane Attention: In target detection, deformable attention uses the position prior of the target and only focuses on a small part of the attention values near the target reference point as a pre-filter, which greatly improves the Convergence is accelerated. During layer iterations, a reference point is placed at the center of the prediction target to refine the sampling locations of attention values, which are dispersed around the reference point via learnable sampling offsets. Intentional initialization of sample offsets includes the geometry preceding the 2D target. By doing so, the multi-branch mechanism can capture the characteristics of each direction well, as shown in Figure 3a.

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

In the context of map learning, Li et al. used naive deformable attention to predict center lines. However, as shown in Figure 3b, due to the naive placement of the reference points, it may not be able to obtain lone range attention. Furthermore, due to the elongated shape of the target and complex visual cues (e.g., accurately predicting breakpoints between solid and dashed lines), this process requires additional adaptive design for our task. Considering all these characteristics, it is necessary for the network to have the ability to not only pay attention to long-range contextual information, but also accurately extract local details. Therefore, it is recommended to distribute the sampling locations over a large area to effectively perceive long-distance information. On the other hand, local details should be easily distinguishable to identify key points. It is worth noting that although there is a competitive relationship between value features within a single attention head, value features between different heads can be retained during the Attention process. Therefore, it is promising to explicitly exploit this property to promote attention to local features of a specific region.

To this end, this article proposes to establish a heads-to-regions mechanism. We first distribute multiple reference points evenly within the Lane Segment area. The sampling locations are then initialized around each reference point in the local area. To preserve complex local details, we use a multi-branch mechanism, where each head focuses on a specific set of sampling locations within a local area, as shown in Figure 3c.

A mathematical description of the lane attention module is now provided. Given BEV features, i-th Lane Segment query feature qi and a set of reference points pi as input, lane attention is calculated as follows:

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

The same initialization of the reference points: Ref. The location of the points is the determining factor for the functionality of the lane attention module. In order to align the area of interest of each instance query with its actual geometry and location, the reference point p in each instance query is distributed based on the Lane Segment prediction of the previous layer, as shown in Figure 3c. and iteratively refine the predictions.

Previous work argued that the reference points provided to the first layer should be individually initialized with learnable priors derived from position query embeddings. However, since the location query is independent of the input image, this initialization method may in turn limit the model's ability to remember geometric and location priors, and incorrectly generated initialization locations can also pose an obstacle to training.

Therefore, for the first layer of the Lane Segment decoder, we propose the same initialization strategy. In the first layer, each head takes the same reference point generated by the position query. Compared with the distributed initialization of reference points in traditional methods (i.e., initializing multiple reference points for each query), the same initialization will make the learning of position priors more stable by filtering out the interference of complex geometries. Note that the same initialization may seem counter-intuitive, but has been observed to work.

LaneSeg Predictor

We use MLP in multiple prediction branches to generate the final predicted Lane Segment from the Lane Segment query, taking into account geometric, semantic and topological aspects. .

For geometry, we first designed a centerline regression branch to regress the vectorized point position of the centerline in three-dimensional coordinates. The output format is. Due to the symmetry of the left and right lane boundaries, we introduce an offset branch to predict the offset, whose format is. Therefore, the left and right lane boundary coordinates can be calculated using

Assuming that lane segments can be conceptualized as drivable areas, we integrate the instance segmentation branch into the predictor. In terms of semantics, three classification branches predict the classification score of C, and the score of C in parallel. The topological branch takes the updated query features as input and outputs a weighted adjacency matrix of the lane graph G using MLP.

Training Loss

LaneSegNet adopts a DETR-like paradigm, using the Hungarian algorithm to efficiently compute a one-to-one optimal allocation between predictions and ground truth. The training loss is then calculated based on the distribution results. The loss function consists of four parts: geometric loss, classification loss, laneline classification loss and topological loss.

Geometric loss supervises the geometric structure of each predicted Lane Segment. According to the binary matching result, a GT Lane Segment is assigned to each predicted vectorized Lane Segment. The vectorized geometric loss is defined as the Manhattan distance calculated between assigned Lane Segment pairs.

Experimental results

Main experimental structure

Lane Segment perception: In Table 1, We compare LaneSegNet with several state-of-the-art methods MapTR, MapTRv2 and TopoNet on the newly introduced Lane Segment-aware benchmark. Retrain their model with our Lane Segment labels. LaneSegNet outperforms other methods by up to 9.6% in mAP, and the average distance error is relatively reduced by 12.5%. LaneSegNet-mini also outperforms previous methods with a higher FPS of 16.2.

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

Qualitative results are shown in Figure 4:

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

Map element detection: In order to Map element detection methods For a fairer comparison, we decompose LaneSegNet’s predicted Lane Segment into pairs of lanes and then compare it with state-of-the-art methods using map element detection metrics. We feed the disassembled lane line and crosswalk labels into several state-of-the-art methods for retraining. The experimental results are shown in Table 2, showing that LaneSegNet always outperforms other methods in map element detection tasks. On a fair comparison, LaneSegNet recovers road geometry better with additional supervision. This shows that Lane Segment learning representation is good at capturing road geometric information.

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

Centerline Awareness: We also compare LaneSegNet with state-of-the-art centerline awareness methods in Table 3. For consistency, center lines are also extracted from Lane Segment for retraining. It can be concluded that LaneSegNet's performance in the lane map perception task is significantly higher than other methods. With additional geographical monitoring, LaneSegNet also demonstrates superior topological reasoning capabilities. It is proved that reasoning ability is closely related to strong positioning and detection capabilities.

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

Ablation Experiment

Lane Segment Formula: In Table 4, we provide ablation to verify our proposed Lane Segment learning formula design advantages and training efficiency. Compared to the separately trained models in the first two rows, joint training of centerlines and map elements brings an overall average improvement of 1.3 on the two main metrics, as shown in row 4, demonstrating the feasibility of multi-task training. However, the common approach of training centerlines and map elements in a single branch by adding additional categories leads to significant performance degradation. Compared with the above naive single-branch method, our model trained with Lane Segment labels obtains significant performance enhancement (7.2 on OLS and 4.4 on mAP for comparison between rows 3 and 5), which verifies The positive interaction between various road information in our map learning formulation is demonstrated. Our model even outperforms multi-branch methods, especially in centerline awareness (OLS of 4.8). This shows that geometry can guide topological reasoning in our map learning formulation, where the multi-branch model only slightly outperforms the CL-only model (0.6 OLS between rows 1 and 4). As for the small decrease, it comes from the reshaping process of our prediction results and is caused by the error of line classification.

ICLR24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness

Lane Attention Module: We show the attention The force module ablation is shown in Table 5. To facilitate a fair comparison, we replace the lane attention module in the framework with an alternative attention design. With our careful design, LaneSegNet with lane attention significantly outperforms these methods, showing significant improvements (mAP improved by 3.9 and TOPll improved by 1.2 compared to row 1). Furthermore, the decoder latency can be further reduced (from 23.45ms to 20.96ms) due to the reduction in the number of queries compared to the hierarchical query design.

Conclusion

This paper proposes Lane Segment awareness as a new map learning formula, and proposes LaneSegNet, an end-to-end solution specifically for this problem network. In addition to the network, two innovative enhancements are proposed, including a lane attention module that employs a head-to-region mechanism to capture long-distance attention, and the same initialization strategy of reference points to enhance the location of lane attention. Prior learning. Experimental results on the OpenLane-V2 dataset demonstrate the effectiveness of our design.

Limitations and Future Work. Due to computational limitations, we do not extend the proposed LaneSegNet to more additional backbones. The formulation of Lane Segment awareness and LaneSegNet may benefit downstream tasks and is worth exploring in the future.

The above is the detailed content of ICLR'24 new ideas without pictures! LaneSegNet: map learning based on lane segmentation awareness. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7495

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

What's going on when the network can't connect to the wifi? Apr 03, 2024 pm 12:11 PM

1. Check the wifi password: Make sure the wifi password you entered is correct and pay attention to case sensitivity. 2. Confirm whether the wifi is working properly: Check whether the wifi router is running normally. You can connect other devices to the same router to determine whether the problem lies with the device. 3. Restart the device and router: Sometimes, there is a malfunction or network problem with the device or router, and restarting the device and router may solve the problem. 4. Check the device settings: Make sure the wireless function of the device is turned on and the wifi function is not disabled.

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

See all articles