#End-to-end differentiable learning for autonomous driving has recently become a prominent paradigm. A major bottleneck is its huge demand for high-quality labeled data, such as 3D boxes and semantic segmentation, which are notoriously expensive to manually annotate. This difficulty is compounded by the salient fact that within-sample behavior in AD often has long-tailed distributions. In other words, most of the data collected may be trivial (e.g., driving forward on a straight road), with only a few situations being safety critical. In this paper, we explore a practically important but underexplored issue, namely how to achieve sample and label efficiency in end-to-end AD.
Specifically, the paper designs a planning-oriented active learning method that gradually annotates parts of the collected raw data based on the diversity and usefulness criteria of the proposed planning routes. Empirically, the proposed plan-oriented approach can outperform general active learning approaches to a large extent. Notably, our method achieves comparable performance to state-of-the-art end-to-end AD methods using only 30% of nuScenes data. Hopefully our work will inspire future work from a data-centric perspective, in addition to methodological efforts.
Paper link: https://arxiv.org/pdf/2403.02877.pdf
Main contribution of this article:
ActiveAD is described in detail in the end-to-end AD framework, and diversity and uncertainty indicators are designed based on the data characteristics of AD .
For active learning in computer vision, initial sample selection is usually based only on the original image without additional information or learning characteristics, which has led to the common practice of random initialization. In the case of AD, there is additional prior information available. Specifically, when collecting data from sensors, traditional information such as the speed and trajectory of the self-vehicle can be recorded simultaneously. Additionally, weather and lighting conditions are often continuous and easy to annotate at the fragment level. This information facilitates making informed choices for initial set selection. Therefore, we designed a self-diversity measure for initial selection.
Ego Diversity: Consists of three parts: 1) Weather lighting 2) Driving instructions 3) Average speed. First, use the description in nuScenes to divide the complete data set into four mutually exclusive subsets: Day Sunny (DS), Day Rainy (DR), Night Sunny (NS), NightRainy (NR). Secondly, each subset is divided into four categories based on the number of left, right and straight driving commands in a complete segment: left turn (L), right turn (R), overtaking (O), and go straight (S). The paper designs a threshold τc, where if the number of left and right commands in a clip is greater than or equal to the threshold τc, we regard it as a transcendent behavior in the clip. If only the number of left commands is greater than the threshold τc, it indicates a left turn. If only the number of rightward commands is greater than the threshold τc, it indicates a right turn. All other cases are considered direct. Third, calculate the average speed in each scene and sort them in ascending order within the relevant subset.
Figure 2 gives the detailed intuitive process of the initial selection process based on multi-way trees.
In this section we will introduce how to incrementally annotate new parts of a fragment based on a model trained with annotated fragments . We will use the intermediate model to perform inference on unlabeled segments, and subsequent selections are based on these outputs. Nonetheless, a planning-oriented perspective is adopted and three criteria for subsequent data selection are introduced: displacement errors, soft collisions, and proxy uncertainties.
Standard 1: Displacement error (DE). will be expressed as the distance between the model’s predicted planned route τ and the human trajectories τ* recorded in the dataset.
where T represents the frame in the scene. Since the displacement error is itself a performance metric (no annotation required), it naturally becomes the first and most critical criterion in active selection.
Standard 2: Soft collision (SC). LSC is defined as the distance between the predicted self-vehicle trajectory and the predicted agent trajectory. Low confidence agent predictions will be filtered out by the threshold ε. In each scenario, the shortest distance is chosen as the measure of hazard coefficient. At the same time, maintain a positive correlation between term and nearest distance:
Use "soft collision" as a criterion because: on the one hand, unlike "displacement error", " The calculation of "collision ratio" depends on annotations of the target's 3D box, which are not available in unlabeled data. Therefore, it should be possible to calculate the criterion based solely on the model's inference results. On the other hand, consider a hard collision criterion: if the predicted self-vehicle trajectory will collide with the trajectories of other predicted agents, assign it 1, otherwise assign it 0. However, this may result in too few samples with label 1, since the collision rate of state-of-the-art models in AD is usually small (less than 1%). Therefore, it was chosen to use the closest distance to other pairs of targets instead of the "collision rate" metric. The risk is considered much higher when the distance to other vehicles or pedestrians is too close. In short, "soft collisions" are an effective measure of collision likelihood and can provide intensive oversight.
Standard III: agent uncertainty (AU). Predictions of the future trajectories of surrounding agents are naturally uncertain, so motion prediction modules typically generate multiple modalities and corresponding confidence scores. Our goal is to select data for which nearby agents have high uncertainty. Specifically, distant subjects are filtered out by a distance threshold δ, and the weighted entropy of the predicted probabilities of multiple modes for the remaining subjects is calculated. Assume that the number of modalities is and the agent’s confidence score in different modalities is Pi(a), where i∈{1,…,Nm}. Then, Agent uncertainty can be defined as:
##Overall Loss:Figure 3: Visualization of selected scenes. Displacement error (col 1), soft collision (col 2), agent uncertainty (col 3) and hybrid (col 4) criteria based on selected front camera images based on a model trained on 10% of the data. Mixed represents our final choice strategy, ActiveAD, and takes the first three scenarios into consideration!
Table 4, performance in various scenarios. The smaller the average L2(m)/average collision rate (%) of the active model using 30% of the data, the better the performance under various weather/lighting and driving command conditions.
Figure 4: Similarity between multiple criteria. It shows the new sampling scenario with 10% (left) and 20% (right) selected by four criteria: Displacement Error (DE), Soft Collision (SC), Agent Uncertainty (AU) and Mixing (MX)
In order to solve the high cost and long-tail problems of end-to-end autonomous driving data annotation, we took the lead in developing a tailor-made active learning solution, ActiveAD. ActiveAD introduces new task-specific diversity and uncertainty measures based on a planning-oriented philosophy. A large number of experiments prove the effectiveness of the method. Using only 30% of the data, it significantly exceeds the general previous methods and achieves performance comparable to the state-of-the-art models. This represents a meaningful exploration of end-to-end autonomous driving from a data-centric perspective, and we hope that our work will inspire future research and discovery.
The above is the detailed content of What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!. For more information, please follow other related articles on the PHP Chinese website!