Zosi Auto R&D released "2022 China Autonomous Driving Data Closed-Loop Research Report".
Nowadays, autonomous driving sensor solutions and computing platforms have become increasingly homogeneous, and suppliers The technological gap is narrowing day by day. In the past two years, autonomous driving technology iterations have advanced rapidly, and mass production has accelerated. According to Zuosi Data Center, in 2021, the cumulative number of domestic L2 assisted driving passenger vehicles will reach 4.79 million, a year-on-year increase of 58.0%. From January to June 2022, the penetration rate of China's L2 assisted driving in the new passenger car market climbed to 32.4%.
For autonomous driving, data runs through the entire life cycle of research and development, testing, mass production, operation and maintenance. With the rapid increase in the number of smart connected car sensors, the amount of data generated by ADAS and autonomous vehicles has also increased exponentially, from GB to TB, PB, EB, and even ZB in the future. Only by using data-driven car evolution to meet the personalized needs of users can car companies go far.
According to the "Safety Guidelines for Automobile Collection Data Processing", automobile collection data refers to the data collected by automobile sensing equipment and control units, as well as the data generated after processing, which can be detailed It is divided into off-vehicle data, cockpit data, operating data and location trajectory data, etc.
According to the "Several Regulations on Automobile Data Security Management (Trial)" promulgated by the Cyberspace Administration of China in August 2021, the collection, analysis, storage, transmission, query, and application of automobile data , deletion and other entire processes have been specified in detail. In the process of car data processing, we adhere to the data processing principles of "in-car processing", "no collection by default", "applicable accuracy range" and "desensitization processing" to reduce the disorderly collection and illegal abuse of car data. In the development process of autonomous driving technology, data collection and processing must first be legal and compliant.
From car cameras, millimeter wave radar, lidar The large amounts of unstructured data (images, videos, speech) collected by ultrasonic and ultrasonic radars can be raw and chaotic. To make data meaningful, it needs to be cleaned, structured, and organized. Data from multiple sources is first imported into the appropriate repository, the data format is standardized, and aggregated according to relevant rules. It then checks for corrupted, duplicate or missing data points and discards unnecessary data that may affect the overall quality of the data set. Finally, labels are used to classify videos captured under different conditions, such as day, night, sunny, rainy, etc. This step provides clean, structured data that will be used for training and validation.
The structured data that has been cleaned after data collection needs to be Label. Annotation is the process of assigning coded values to raw data. Encoded values include, but are not limited to, assigning class labels, drawing bounding boxes, and marking object boundaries. High-quality annotations are needed to teach supervised learning models what objects are and to measure the performance of trained models.
In the field of autonomous driving, the scenarios for data annotation processing usually include lane changing and overtaking, passing through intersections, unprotected left turns and right turns without traffic light control. Turns, and some complex long-tail scenes such as vehicles running red lights, pedestrians crossing the road, vehicles parked illegally on the roadside, etc.
Commonly used annotation tools include general image drawing, lane line annotation, driver face annotation, 3D point cloud annotation, 2D/3D fusion annotation, panoramic semantic segmentation, etc. Due to the development of big data and the increase in the number of large data sets, the use of data annotation tools continues to expand rapidly.
Nowadays, the frequency of data collection has entered the millisecond level, requiring It is high-precision data of thousands of signal dimensions (such as bus signals, sensor internal states, software buried points, user behavior and environment perception data, etc.), while avoiding data loss, disorder, jumps and delays, and at high Under the premise of high accuracy and high quality, transmission/storage costs are greatly reduced. The uplink and downlink of Internet of Vehicles data are relatively long (from the vehicle MCU, DCU, gateway, 4G/5G to the cloud) and it is necessary to ensure the data transmission quality of each link node.
In response to the new changes in data transmission, some companies have been able to provide efficient data collection and vehicle-cloud integrated transmission solutions, such as Zhixiehui and EXCEEDDATA flexible data acquisition platform solutions, which are based on real-time data in the vehicle edge computing environment. 10 millisecond-level real-time computing is used to trigger the flexible data collection and upload function. The uploaded data has been calculated and filtered, significantly reducing the amount of uploaded data. In addition, the original signal from the vehicle is compressed and stored 100-300 times losslessly. The cloud management platform saves the high-quality signal from the vehicle without loss and high compression ratio. It supports the issuance of data acquisition algorithms, the triggering of multiple acquisition modes, and the real-time upload of collected data. One-click download to the business desktop, multiple flexible filtering by vehicle, event, time period, etc., easy to use and solve, separation of storage and calculation, realizing a closed loop of vehicle-cloud isomorphic data collection-calculation-uploading-processing; In 2021, the first domestic mass-produced model equipped with the EXCEEDDATA solution has been launched (HiPhiX).
Source: Zhixiehuitong
In order to perceive the surrounding environment more clearly, self-driving cars are equipped with more sensors and generate a large amount of data. Some high-level autonomous driving systems are even equipped with more than 40 various sensors to accurately perceive the 360° environment around the vehicle. The development of autonomous driving systems requires multiple steps such as data collection, data aggregation, cleaning and labeling, model training, simulation, and big data analysis. This process involves the aggregation and storage of massive data, data flow between different systems in different links, and Read and write massive amounts of data during model training. Data faces new challenges with storage bottlenecks.
To this end, the technologies and capabilities of many cloud service providers in this area have become the key to helping car companies win. For example, Amazon Cloud Technology AWS uses the autonomous driving data lake as the center to help car companies build an end-to-end autonomous driving data closed loop. Use Amazon Simple Storage Service (Amazon S3, cloud object storage service) to build an autonomous driving data lake to achieve data collection, data management and analysis, data annotation, model and algorithm development, simulation verification, map development, DevOps and MLOps, and car companies It can more easily realize the development, testing and application of the entire process of autonomous driving.
Source: AWS
Among domestic technology giants, take Baidu data closed-loop solution as an example, its data storage provides Data retrieval service for roadside and vehicle multi-source data information, used for massive data search on the business platform, with multi-dimensional retrieval (vehicle information, mileage, autonomous driving duration, etc.), management of the entire life cycle from data production to destruction, Supports advantages such as panoramic data view, data traceability, and open data sharing.
Source: Baidu
The development of autonomous driving has shifted from technology-driven to data-driven, but data-driven business models face many difficulties.
It is difficult to process massive data: The amount of data collected by high-level autonomous driving test vehicles every day is TB level, and the development team needs PB level storage space, but these data can be used for training The value data accounts for less than 5%. In addition, there are strict safety compliance requirements for data collected by sensors such as vehicle cameras, lidar, and high-precision positioning, which undoubtedly poses great challenges to the access, storage, desensitization, and processing of massive data.
The cost of data annotation is high: Data annotation takes up a lot of manpower and time costs. With the development of high-level autonomous driving capabilities, the complexity of scenarios continues to increase, and more difficult scenarios will appear. Improving the accuracy of the vehicle perception model places higher requirements on the scale and quality of the training data set. Traditional manual annotation has been unable to meet the demand for massive data sets for model training in terms of efficiency and cost.
Simulation test efficiency is low: Virtual simulation is an effective means to accelerate the training of autonomous driving algorithms, but simulation scenarios are difficult to construct and have low degree of restoration, especially some complex and dangerous scenarios, which are difficult to construct. In addition, the parallel simulation capability is insufficient, the efficiency of simulation testing is low, and the iteration cycle of the algorithm is too long.
High-precision map coverage is low: High-precision maps mainly rely on self-collection and self-made maps, and only meet the scenarios of designated roads in the experimental stage. In the future, it will be commercialized and expanded to urban streets in major cities across the country. It will face very prominent challenges in terms of coverage, dynamic updates, as well as cost and efficiency.
In order to solve various difficulties and problems, efficient development of autonomous driving requires the construction of an efficient data closed-loop system.
Source: Freetech
As far as the autonomous driving data closed loop is concerned, in Corner Cases need to be constantly solved during the implementation of autonomous driving. For this purpose, sufficient data samples and convenient vehicle-side verification methods must be available. Shadow mode is one of the best solutions for solving Corner Cases.
Shadow mode was proposed by Tesla in April 2019 and applied to the vehicle to compare relevant decisions and trigger data upload. The self-driving software on the sold vehicle is used to continuously record the data detected by the sensors and selectively transmit it back at the appropriate time for machine learning and improvement of the original self-driving algorithm.
Dojo supercomputer can use massive video data for unsupervised annotation and training.
In 2021, Tesla delivered 936,200 vehicles globally, of which 484,100 were delivered by the Chinese factory. In the first half of 2022, 560,000 vehicles will be delivered. Tesla takes advantage of mass production and continuously optimizes algorithms through shadow mode. Using shadow mode, millions of sold vehicles are used as test vehicles to capture surrounding perceptions and special road conditions, and continuously strengthen the ability to predict, avoid, and learn from uncertain events. Because there are millions of sold vehicles to support it, the coverage of Corner Cases and extreme working conditions will be more comprehensive. The high-quality data collected by flexible triggering can iterate out better algorithms, and the excellence of algorithm iteration determines The value of software. In terms of software upgrade subscription services, the explosive power of data closed loop has just emerged.
The premise of continuous iteration of the autonomous driving system is the continuous optimization of the algorithm, and the algorithm The excellence depends on the efficiency of the data closed-loop system. The efficient flow of data in each scenario of autonomous driving development is crucial. Data intelligence will become the key to accelerating the mass production of autonomous driving.
In December 2021, HaoMo Zhixing officially released MANA Xuehu, China’s first autonomous driving data intelligence system, which accelerates autonomous driving from the five major capabilities of perception, cognition, annotation, simulation, and calculation. The evolution of driving technology. In the next three years, the assisted driving system can be installed on more than 1 million passenger cars. Relying on its fully self-developed autonomous driving system, Haomo Zhixing has achieved significant advantages in the accumulation, processing, and application of data. Massive data brings the advantage of technological iteration. The advantages of cost reduction and efficiency increase are obvious.
For another example, Momenta has achieved leading full-process data-driven technical capabilities, including algorithm modules such as perception, fusion, prediction, and control, which can be iterated efficiently in a data-driven manner. with updates. Its Closed Loop Automation is a set of tool chains that allow data flow to drive automatic iteration of data-driven algorithms. CLA can automatically filter out massive amounts of golden data, drive the automatic iteration of algorithms, and make the self-driving flywheel spin faster and faster.
Source: Momenta
In the context of software-defined cars, data, algorithms and computing power are the troika of autonomous driving development. The research and development cycles of car companies are shortened and function iterations are accelerated. In the future, being able to continuously collect data at low cost, high efficiency and high efficiency, and iterate algorithms through real data to ultimately form a data closed loop and business closed loop is the key to the sustainable development of autonomous driving companies.
The above is the detailed content of Data closed-loop research: The development of autonomous driving shifts from technology-driven to data-driven. For more information, please follow other related articles on the PHP Chinese website!