Interpretation of Tesla's autonomous driving algorithms and models-AI-php.cn

Table of Contents

06 Infrastructure" >06 Infrastructure

Home

Technology peripherals

Interpretation of Tesla's autonomous driving algorithms and models

王林

Apr 11, 2023 pm 12:04 PM

algorithm Autopilot

Tesla is a typical AI company. It has trained 75,000 neural networks in the past year, which means a new model is produced every 8 minutes. A total of 281 models use Tesla cars. superior. Next, we will interpret Tesla FSD’s algorithm and model progress in several aspects.

01 Perception Occupancy Network

One of Tesla’s key technologies in perception this year is Occupancy Network. Students who study robotics will definitely be familiar with the occupation grid. Occupancy indicates whether each 3D voxel (voxel) in the space is occupied. It can be a binary representation of 0/1 or one between [0, 1]. probability value.

Why is estimation of occupancy important for autonomous driving perception? Because during driving, in addition to common obstacles such as vehicles and pedestrians, we can estimate their positions and sizes through 3D object detection. There are also more long-tail obstacles that will also have an important impact on driving. For example: 1. Deformable obstacles, such as two-section trailers, are not suitable to be represented by 3D bounding boxes; 2. Special-shaped obstacles, such as overturned vehicles, 3D attitude estimation will be invalid; 3. Not in known categories Obstacles such as stones and garbage on the road cannot be classified. Therefore, we hope to find a better expression to describe these long-tail obstacles and fully estimate the occupancy of each position in the 3D space, even the semantics and movement (flow).

Tesla uses the specific example in the figure below to demonstrate the power of Occupancy Network. Unlike 3D boxes, the representation of occupation does not make too many geometric assumptions about the object, so it can model objects of any shape and any form of object motion. The figure shows a scene where a two-section bus is starting. Blue represents moving voxels and red represents stationary voxels. The Occupancy Network accurately estimates that the first section of the bus has started to move, while the second section of the bus has started to move. The section is still at rest.

Interpretation of Teslas autonomous driving algorithms and models

Occupancy estimation of two buses starting, blue represents moving voxels, red represents stationary voxels

#The model structure of Occupancy Network is shown in the figure below. First, the model uses RegNet and BiFPN to obtain features from multiple cameras. This structure is consistent with the network structure shared at last year's AI day, indicating that the backbone has not changed much. The model then performs attention-based multi-camera fusion on 2D image features through spatial query with 3D spatial position. How to realize the connection between 3D spatial query and 2D feature map? The specific fusion method is not detailed in the figure, but there are many public papers for reference. I think the most likely solution is one of two solutions. The first one is called 3D-to-2D query, which projects the 3D spatial query onto the 2D feature map based on the internal and external parameters of each camera to extract the features of the corresponding position. This method was proposed in DETR3D, and BEVFormer and PolarFormer also adopted this idea. The second is to use positional embedding to perform implicit mapping, that is, add reasonable positional embedding to each position of the 2D feature map, such as camera internal and external parameters, pixel coordinates, etc., and then let the model learn the correspondence between 2D and 3D features by itself. . Next, the model undergoes time-series fusion. The implementation method is to splice the 3D feature space based on the known position and attitude changes of the self-vehicle.

Interpretation of Teslas autonomous driving algorithms and models

##Occupancy Network structure

After feature fusion, a deconvolution-based The decoder will decode the occupation, semantics and flow of each 3D space position. The press conference emphasized that because the output of this network is dense, the output resolution will be limited by memory. I believe this is also a major headache for all students who do image segmentation. What's more, what we are doing here is 3D segmentation, but autonomous driving has very high resolution requirements (~10cm). Therefore, inspired by neural implicit representation, an implicit queryable MLP decoder is designed at the end of the model. By inputting any coordinate value (x, y, z), the information of the spatial position can be decoded, that is, occupation, semantics, flow. This method breaks the limitation of model resolution, which I think is a highlight of the design.

02 Planning Interactive Planning

Planning is another important module of autonomous driving. Tesla this time mainly emphasizes interaction at complex intersections. ) for modeling. Why is interaction modeling so important? Because there is a certain degree of uncertainty in the future behavior of other vehicles and pedestrians, a smart planning module needs to predict multiple interactions between self-vehicles and other vehicles online, and evaluate the risks brought by each interaction, and finally Decide what strategy to pursue.

Tesla calls the planning model they adopt Interaction Search, which mainly consists of three main steps: tree search, neural network trajectory planning and trajectory scoring.

1. Tree search is a commonly used algorithm for trajectory planning. It can effectively discover various interactive situations and find the optimal solution. However, using the search method to solve trajectory planning problems encounters the biggest problem. The difficulty is that the search space is too large. For example, there may be 20 vehicles related to oneself at a complex intersection, which can be combined into more than 100 interaction methods, and each interaction method may have dozens of spatio-temporal trajectories as candidates. Therefore, Tesla did not use the trajectory search method, but used a neural network to score the target positions (goals) that may be reached after a period of time and obtain a small number of better targets.

2. After determining the target, we need to determine a trajectory to reach the target. Traditional planning methods often use optimization to solve this problem. It is not difficult to solve the optimization problem. Each optimization takes about 1 to 5 milliseconds. However, when there are many candidate targets given by the tree search in the previous steps, we cannot solve the problem in terms of time cost. burden. Therefore, Tesla proposed using another neural network for trajectory planning to achieve highly parallel planning for multiple candidate targets. There are two sources of trajectory labels for training this neural network: the first is the trajectory of real human driving, but we know that the trajectory of human driving may be only one of many better solutions, so the second source is through offline optimization Other trajectory solutions produced by the algorithm.

3. After obtaining a series of feasible trajectories, we need to choose an optimal solution. The solution adopted here is to score the obtained trajectory. The scoring solution combines artificially formulated risk indicators, comfort indicators, and a neural network scorer.

Through the decoupling of the above three steps, Tesla has implemented an efficient trajectory planning module that takes interaction into account. There are not many papers that can be referenced for trajectory planning based on neural networks. I have published a paper TNT [5] that is relatively related to this method. It also decomposes the trajectory prediction problem into the above three steps to solve: target scoring, Trajectory planning, trajectory scoring. Interested readers can check out the details. In addition, our research group has been exploring issues related to behavioral interaction and planning, and everyone is welcome to pay attention to our latest work InterSim[6].

Interpretation of Teslas autonomous driving algorithms and models

Interaction Search Planning Model Structure

03 Vector Map Lanes Network

Personally, I think another major technical highlight of this AI Day is the online vector map construction model Lanes Network. Students who paid attention to AI Day last year may remember that Tesla conducted complete online segmentation and recognition of maps in the BEV space. So why do we still want to build Lanes Network? Because the segmented pixel-level lanes are not enough for trajectory planning, we also need to get the topology of the lane lines to know that our car can change from one lane to another.

Let’s first take a look at what a vector map is. As shown in the figure, Tesla’s vector map consists of a series of blue lane centerlines and some key points (connection points connection, fork point, merge point), and their connection relationship is expressed in the form of graph.

Interpretation of Teslas autonomous driving algorithms and models

Vector map, the dots are the key points of the lane line, and the blue is the center line of the lane

Lanes Network is a decoder based on the backbone of the perceptual network in terms of model structure. Compared with decoding the occupancy and semantics of each voxel, it is more difficult to decode a series of sparse, connected lane lines because the number of outputs is not fixed, and there are logical relationships between the output quantities.

Tesla refers to the Transformer decoder in the natural language model and outputs the results autoregressively in a sequential manner. In terms of specific implementation, we must first select a generation order (such as from left to right, top to bottom) and discretize the space (tokenization). Then we can use Lanes Network to predict a series of discrete tokens. As shown in the figure, the network will first predict the rough position (index: 18) and precise position (index: 31) of a node, then predict the semantics of the node ("Start", which is the starting point of the lane line), and finally predict the connection Characteristics, such as bifurcation/merging/curvature parameters, etc. The network will generate all lane line nodes in this autoregressive manner.

Interpretation of Teslas autonomous driving algorithms and models

##Lanes Network network structure

We should note that autoregression Sequence generation is not patented by the language Transformer model. Our research group has also published two related papers on generating vector maps in the past few years, HDMapGen[7] and VectorMapNet[8]. HDMapGen uses the graph neural network with attention (GAT) to autoregressively generate the key points of the vector map, which is similar to Tesla's solution. VectorMapNet uses Detection Transformer (DETR) to solve this problem, using a set prediction solution to generate vector maps more quickly.

Interpretation of Teslas autonomous driving algorithms and models

HDMapGen vector map generation result

Interpretation of Teslas autonomous driving algorithms and models

VectorMapNet vector map generation results

04 Autolabeling

Auto labeling is also Tes La is a technology that was explained at last year’s AI Day. This year’s automatic annotation focuses on the automatic annotation of Lanes Network. Tesla vehicles can generate 500,000 driving journeys (trips) every day, and making good use of this driving data can better help predict lane lines.

Tesla’s automatic lane marking has three steps:

1. Through visual inertial odometry technology, High-precision trajectory estimation for all journeys.

2. Map reconstruction of multiple vehicles and multiple journeys is the most critical step in this plan. The basic motivation for this step is that different vehicles may observe the same location from different spatial angles and times, so aggregating this information can lead to better map reconstruction. The technical points of this step include geometric matching between maps and joint optimization of results.

3. Automatically mark lanes for new journeys. When we have high-precision offline map reconstruction results, when a new journey occurs, we can perform a simple geometric matching to obtain the pseudo-true value (pseudolabel) of the lane line of the new journey. This method of obtaining pseudo-true values is sometimes even better than manual annotation (at night, rainy and foggy days).

Interpretation of Teslas autonomous driving algorithms and models

##Lanes Network automatically annotates

05 Simulation

The simulation of visual images has been a popular direction in computer vision in recent years. In autonomous driving, the main purpose of visual simulation is to generate some rare scenes in a targeted manner, thereby eliminating the need to try your luck in real road tests. For example, Tesla has always had a headache with the scene of a large truck lying in the middle of the road. But visual simulation is not a simple problem. For a complex intersection (Market Street in San Francisco), the solution using traditional modeling and rendering requires the designer 2 weeks. Tesla’s AI-based solution now only takes 5 minutes.

Interpretation of Teslas autonomous driving algorithms and models

Visual simulation reconstructed intersection

Specifically, visual simulation The prerequisite is to prepare automatically labeled real-world road information and a rich graphics material library. Then proceed to the following steps in sequence:

1. Pavement generation: Fill the road surface according to the curb, including road slope, material and other detailed information.

2. Lane line generation: draw lane line information on the road surface.

3. Plant and building generation: Randomly generate and render plants and houses between roads and roadsides. The purpose of generating plants and buildings is not only for visual beauty, it also simulates the occlusion effect caused by these objects in the real world.

4. Generate other road elements: such as traffic lights, street signs, and import lanes and connection relationships.

5. Add dynamic elements such as vehicles and pedestrians.

06 Infrastructure

Finally, let’s briefly talk about the foundation of Tesla’s series of software technologies, which is powerful infrastructure . Tesla’s supercomputing center has 14,000 GPUs and a total of 30PB of data cache, and 500,000 new videos flow into these supercomputers every day. In order to process this data more efficiently, Tesla has specially developed an accelerated video decoding library, as well as a file format .smol file format that accelerates reading and writing intermediate features. In addition, Tesla has also developed its own chip Dojo for the supercomputing center, which we will not explain here.

Interpretation of Teslas autonomous driving algorithms and models

Super computing center for video model training

07 Summary

With the release of Tesla AI Day content in the past two years, we have slowly seen Tesla’s technical landscape in the direction of automatic (assisted) driving. At the same time, we We also see that Tesla itself is constantly iterating on itself, such as from 2D perception, BEV perception, to Occupancy Network. Autonomous driving is a long journey of thousands of miles. What is supporting the evolution of Tesla's technology? I think there are three points: full scene understanding capabilities brought by visual algorithms, model iteration speed supported by powerful computing power, and generalization brought by massive data. Aren’t these the three pillars of the deep learning era?

The above is the detailed content of Interpretation of Tesla's autonomous driving algorithms and models. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7722

Java Tutorial

1642

CakePHP Tutorial

1396

Laravel Tutorial

1289

PHP Tutorial

1233

Related knowledge

How to solve the long tail problem in autonomous driving scenarios? Jun 02, 2024 pm 02:44 PM

Yesterday during the interview, I was asked whether I had done any long-tail related questions, so I thought I would give a brief summary. The long-tail problem of autonomous driving refers to edge cases in autonomous vehicles, that is, possible scenarios with a low probability of occurrence. The perceived long-tail problem is one of the main reasons currently limiting the operational design domain of single-vehicle intelligent autonomous vehicles. The underlying architecture and most technical issues of autonomous driving have been solved, and the remaining 5% of long-tail problems have gradually become the key to restricting the development of autonomous driving. These problems include a variety of fragmented scenarios, extreme situations, and unpredictable human behavior. The "long tail" of edge scenarios in autonomous driving refers to edge cases in autonomous vehicles (AVs). Edge cases are possible scenarios with a low probability of occurrence. these rare events

CLIP-BEVFormer: Explicitly supervise the BEVFormer structure to improve long-tail detection performance Mar 26, 2024 pm 12:41 PM

Written above & the author’s personal understanding: At present, in the entire autonomous driving system, the perception module plays a vital role. The autonomous vehicle driving on the road can only obtain accurate perception results through the perception module. The downstream regulation and control module in the autonomous driving system makes timely and correct judgments and behavioral decisions. Currently, cars with autonomous driving functions are usually equipped with a variety of data information sensors including surround-view camera sensors, lidar sensors, and millimeter-wave radar sensors to collect information in different modalities to achieve accurate perception tasks. The BEV perception algorithm based on pure vision is favored by the industry because of its low hardware cost and easy deployment, and its output results can be easily applied to various downstream tasks.

This article is enough for you to read about autonomous driving and trajectory prediction! Feb 28, 2024 pm 07:20 PM

Trajectory prediction plays an important role in autonomous driving. Autonomous driving trajectory prediction refers to predicting the future driving trajectory of the vehicle by analyzing various data during the vehicle's driving process. As the core module of autonomous driving, the quality of trajectory prediction is crucial to downstream planning control. The trajectory prediction task has a rich technology stack and requires familiarity with autonomous driving dynamic/static perception, high-precision maps, lane lines, neural network architecture (CNN&GNN&Transformer) skills, etc. It is very difficult to get started! Many fans hope to get started with trajectory prediction as soon as possible and avoid pitfalls. Today I will take stock of some common problems and introductory learning methods for trajectory prediction! Introductory related knowledge 1. Are the preview papers in order? A: Look at the survey first, p

Let's talk about end-to-end and next-generation autonomous driving systems, as well as some misunderstandings about end-to-end autonomous driving? Apr 15, 2024 pm 04:13 PM

In the past month, due to some well-known reasons, I have had very intensive exchanges with various teachers and classmates in the industry. An inevitable topic in the exchange is naturally end-to-end and the popular Tesla FSDV12. I would like to take this opportunity to sort out some of my thoughts and opinions at this moment for your reference and discussion. How to define an end-to-end autonomous driving system, and what problems should be expected to be solved end-to-end? According to the most traditional definition, an end-to-end system refers to a system that inputs raw information from sensors and directly outputs variables of concern to the task. For example, in image recognition, CNN can be called end-to-end compared to the traditional feature extractor + classifier method. In autonomous driving tasks, input data from various sensors (camera/LiDAR

FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

Implementing Machine Learning Algorithms in C++: Common Challenges and Solutions Jun 03, 2024 pm 01:25 PM

Common challenges faced by machine learning algorithms in C++ include memory management, multi-threading, performance optimization, and maintainability. Solutions include using smart pointers, modern threading libraries, SIMD instructions and third-party libraries, as well as following coding style guidelines and using automation tools. Practical cases show how to use the Eigen library to implement linear regression algorithms, effectively manage memory and use high-performance matrix operations.

Explore the underlying principles and algorithm selection of the C++sort function Apr 02, 2024 pm 05:36 PM

The bottom layer of the C++sort function uses merge sort, its complexity is O(nlogn), and provides different sorting algorithm choices, including quick sort, heap sort and stable sort.

nuScenes' latest SOTA | SparseAD: Sparse query helps efficient end-to-end autonomous driving! Apr 17, 2024 pm 06:22 PM

Written in front & starting point The end-to-end paradigm uses a unified framework to achieve multi-tasking in autonomous driving systems. Despite the simplicity and clarity of this paradigm, the performance of end-to-end autonomous driving methods on subtasks still lags far behind single-task methods. At the same time, the dense bird's-eye view (BEV) features widely used in previous end-to-end methods make it difficult to scale to more modalities or tasks. A sparse search-centric end-to-end autonomous driving paradigm (SparseAD) is proposed here, in which sparse search fully represents the entire driving scenario, including space, time, and tasks, without any dense BEV representation. Specifically, a unified sparse architecture is designed for task awareness including detection, tracking, and online mapping. In addition, heavy

See all articles