Trajectory prediction plays an important role in autonomous driving. Autonomous driving trajectory prediction refers to predicting the future driving trajectory of the vehicle by analyzing various data during the vehicle's driving process. As the core module of autonomous driving, the quality of trajectory prediction is crucial to downstream planning control. The trajectory prediction task has a rich technology stack and requires familiarity with autonomous driving dynamic/static perception, high-precision maps, lane lines, neural network architecture (CNN&GNN&Transformer) skills, etc. It is very difficult to get started! Many fans hope to get started with trajectory prediction as soon as possible and avoid pitfalls. Today I will take stock of some common problems and introductory learning methods for trajectory prediction!
A: Let’s first look at the sequential network, graph neural network and Evaluation in survey, problem formulation, and deep learning-based methods.
Coupling and behavior are not the same. Coupling usually refers to the actions that the target vehicle may take, such as changing lanes, parking, and overtaking. , accelerate, turn left, turn right or go straight. The trajectory refers to a specific future location point with time information.
In the table on the right, the OBJECT_TYPE column usually represents the self-driving vehicle itself. The data set usually specifies one or more obstacles to be predicted for each scene, and these targets to be predicted are called targets or focal agents. Some datasets also provide semantic labels for each obstacle, such as vehicles, pedestrians, or bicycles.
Q2: Are the data forms of vehicles and pedestrians the same? I mean, for example, one point cloud point represents a pedestrian, and dozens of points represent vehicles?
A: This kind of trajectory data set actually gives the xyz coordinates of the center point of the object, both for pedestrians and vehicles.
Q3: The argo1 and argo2 data sets are only specified. A predicted obstacle, right? How to use these two data sets when doing multi-agent prediction
argo1 only specifies one obstacle, while argo2 may specify as many as twenty. However, even if only one obstacle is specified, this does not affect your model's ability to predict multiple obstacles.
A: "Predict" the self-vehicle trajectory as the self-vehicle planning trajectory, you can refer to uniad
A: nn network is basically not required, rule based requires some knowledge
A: First read the review and sort out the mind map, such as "Machine Learning for Autonomous Vehicle's Trajectory Prediction: A comprehensive survey, Challenges, and Future Research" Directions" for this review, please read the original English text
A1(stu): 默认预测属于感知吧,或者决策中隐含预测,反正没有预测不行。A2(stu): 决策该规控做,有行为规划,高级一点的就是做交互和博弈,有的公司会有单独的交互博弈组
A: Prediction is based on the trajectory of other cars, and control is based on the trajectory of the car. The two trajectories also affect each other, so prediction is generally based on regulation.
Q: Some public information, such as Xiaopeng’s perception xnet, will produce prediction trajectories at the same time. At this time, I feel that the prediction work is placed under the perception module, or that both modules have their own predictions. Modules, different goals?
A: They will affect each other, so in some places prediction and decision-making are a group. For example, if the trajectory planned by your own car is intended to squeeze other cars, other cars will generally give way. Therefore, some work will regard the planning of the own vehicle as part of the input of other vehicle models. You can refer to M2I (M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction). This article has similar ideas. You can learn about this lane of PiP: Planning-informed Trajectory Prediction for Autonomous Driving
A: Manually marked
A: hivt can be used as a baseline, and many people use it
A: It has a certain generalization ability, and the effect is not bad without retraining.
A(stu): 选择结果最好的Q2:结果最好是根据什么来判定呢?是根据概率值大小还是根据和gt的距离A: 实际在没有ground truth的情况下,你要取“最好”的轨迹,那只能选择相信预测概率值最大的那条轨迹了Q3: 那有gt的情况下,选择最好轨迹的时候,根据和gt之间的end point或者average都可以是吗A: 嗯嗯,看指标咋定义
A: It’s all covered in this course. You can refer to Chapter 2, which will also be covered in Chapter 4. The difference between heterogeneous graphs and isomorphic graphs: the types of nodes in isomorphic graphs There is only one kind of connection between one node and another node. For example, in a social network, it can be imagined that node only has one type of "people" and edge only has one type of connection "knowledge". And people either know each other or they don't. But it is also possible to segment people, likes, and tweets. Then people may be connected through acquaintance, people may be connected through likes on tweets, and people may also be connected through likes on a tweet (meta path). Here, the diverse expression of nodes and relationships between nodes requires the introduction of heterogeneous graphs. In heterogeneous graphs, there are many types of nodes. There are also many types of connection relationships (edges) between nodes, and there are even more types of combinations of these connection relationships (meta-path). The relationships between these nodes are classified into different degrees of severity, and different connection relationships are also classified into different degrees of severity.
A: You can select cars within a certain radius, or you can consider cars with K nearest neighbors. You can even come up with a more advanced heuristic neighbor screening strategy yourself, and it is even possible to let the model learn it by itself. Are the two cars coming out neighbors?
Q2: Let’s consider a certain range. Is there any principle for selecting the radius? In addition, at which time step did the selected vehicles occur?
A: It is difficult to have a standard answer to the choice of radius. This is essentially asking how much remote information the model needs when making predictions. It is a bit For the second question when choosing the size of the convolution kernel, my personal rule is that if you want to model the interaction between objects at which time, you should select neighbors based on the relative position of the object at that time
Q3: In this case, do we need to model the historical time domain? The surrounding vehicles within a certain range will also change at different time steps, or should we only consider the surrounding vehicle information at the current moment?
A: Either way, it depends on how you design the model
A: Just look at it. The operation of motion former is relatively conventional. You will see similar SA and CA in many papers. Nowadays, many sota models are relatively heavy. For example, the decoder will have a cyclic refine
A2: What is done is marginal prediction rather than joint prediction; 2. Prediction and planning are done separately, without explicitly considering ego and Interactive game of surrounding agents; 3. Scene-centric representation is used, without considering symmetry, and the effect is inevitable
Q2: What is marginal prediction
A: For details, please refer to scene transformer
Q3: Regarding the third point, scene centric does not consider symmetry. How to understand it?
A: It is recommended to look at HiVT, QCNet, MTR. Of course, symmetry is important for end-to-end models. The design is not easy to do either
A2: It can be understood that the input is scene data, but in the network it will be modeled to look at the surrounding scenes with each target as the central perspective, so that you can In the forward, we get the coding of each target centered on itself, and then we can consider the interaction between these codes
A: Each agent has its own local region, and the local region is centered on this agent.
A: It can be understood as the direction of the front of the car
A: Actually, I don’t know if I understand it correctly. I guess it refers to whether a certain lane is affected by traffic lights/stop signs/speed limit signs, etc.
A: Try both, whichever one works better There are advantages. For Laplace loss to be effective, there are still some details that need to be paid attention to
Q2: Does it mean that the parameters need to be adjusted?
A: Compared with L1 loss, Laplace loss actually predicts one more scale parameter
Q3: Yes, but I don’t know what use this is if it only predicts one trajectory. It feels like redundancy. I understand it as uncertainty. I don’t know if it is correct
A:如果你从零推导过最小二乘法就会知道,MSE其实是假设了方差为常数的高斯分布的NLL。同理,L1 loss也是假设了方差为常数的Laplace分布的NLL。所以说LaplaceNLL也可以理解为方差非定值的L1 loss。这个方差是模型自己预测出来的。为了使loss更低,模型会给那些拟合得不太好的样本一个比较大的方差,而给拟合得好的样本比较小的方差
Q4:那是不是可以理解为对于非常随机的数据集【轨迹数据存在缺帧 抖动】 就不太适合Laplace 因为模型需要去拟合这个方差?需要数据集质量比较高
A:这个说法我觉得不一定成立。从效果上来看,会鼓励模型优先学习比较容易拟合的样本,再去学习难学习的样本
Q5:还想请问下这句话(Laplace loss要效果好还是有些细节要注意的)如何理解 A:主要是预测scale那里。在模型上,预测location的分支和预测scale的分支要尽量解耦,不要让他们相互干扰。预测scale的分支要保证输出结果>0,一般人会用exp作为激活函数保证非负,但是我发现用ELU +1会更好。然后其实scale的下界最好不要是0,最好让scale>0.01或者>0.1啥的。以上都是个人看法。其实我开源的代码(周梓康大佬的github开源代码)里都有这些细节,不过可能大家不一定注意到。
给出链接:https://github.com/ZikangZhou/QCNet
https://github.com/ZikangZhou/HiVT
https://github.com/L1aoXingyu/pytorch-beginner/tree/master/08-AutoEncoder
A:Polyline就是折线,折线就是一段一段的,每一段都可以看成是一段向量Q2:请问这个折线段和图神经网络的节点之间的边有关系吗?或者说Polyline这个折现向量相当于是图神经网络当中的节点还是边呀?A:一根折线可以理解为一个节点。轨迹预测里面没有明确定义的边,边如何定义取决于你怎么理解这个问题。Q3: VectorNet里面有很多个子图,每个子图下面有很多个Polyline,把Polyline当做向量的话,就相当于把Polyline这个节点变成了向量,相当于将节点进行特征向量化对吗?然后Polyline里面有多个Vector向量,就是相当于是构成这个节点的特征矩阵么?A: 一个地图里有很多条polyline;一个Polyline就是一个子图;一个polyline由很多段比较短的向量组成,每一段向量都是子图上的一个节点
A: 节点的粒度不同,要说效果的话那得看具体实现;速度的话,显然粒度越粗效率越高Q2:从效果角度看,什么时候选用哪种有没有什么原则?A: 没有原则,都可以尝试
A: This requires you to enter a flowing input such as 0-19 and 1-20 The frames are then compared with the square of the difference in scores of the corresponding trajectories between the two frames, and statistics are enough.
Q2: What indicators does Mr. Thomas recommend? I currently use first-order derivatives and second-order derivatives. But it seems not very obvious. Most of the first-order derivatives and second-order derivatives are concentrated near 0.
A: I feel that the squared difference of the scores of the corresponding trajectories of consecutive frames is enough. For example, if you have n consecutive inputs, sum them up and divide by n. But the scene changes in real time, and the score should change suddenly when there is an interaction or when going from a non-intersection to an intersection.
A: Just standardize the data. It may be somewhat useful, but probably not much
A: There is not much difference between addition and concat, but for the fusion of category embedding and numerical embedding, they are actually completely equivalent
Q2: How should we understand complete equivalence?
A: Concating the two and then passing through a linear layer is actually equivalent to embedding the value through a linear layer and embedding the category through a linear layer, and then adding the two. There is actually no point in embedding the category through a linear layer. In theory, this linear layer can be integrated with the parameters in nn.Embeddding
A: I don’t know, but according to the information I learned, I don’t know whether NV or which car manufacturer uses HiVT to predict pedestrians, so the actual deployment is definitely feasible
A: Among the current future prediction solutions based on occupation, the most promising one should be this one: https://arxiv.org/abs/2308.01471
A: This potentially public data set is difficult and generally does not provide the planned trajectory of your own vehicle. In ancient times, there was an article called PiP, Hong Kong Ke Haoran Song. I feel that articles about conditional prediction can be considered what you want, such as M2I
A(stu): This paper is discussed: Choose Your Simulator Wisely A Review on Open-source Simulators for Autonomous Driving
The above is the detailed content of This article is enough for you to read about autonomous driving and trajectory prediction!. For more information, please follow other related articles on the PHP Chinese website!