Home > Technology peripherals > AI > body text

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

王林
Release: 2023-09-30 08:49:06
forward
1449 people have browsed it

This article is reprinted with the authorization of the Autonomous Driving Heart public account. Please contact the source for reprinting.

[RenderOcc, the first new paradigm for training multi-view 3D occupancy models using only 2D labels] The author extracts NeRF-style 3D volume representations from multi-view images and uses volume rendering techniques to build 2D reconstructions, thus Enables direct 3D supervision from 2D semantic and depth labels, reducing reliance on expensive 3D occupancy annotations. Extensive experiments show that RenderOcc performs comparably to fully supervised models using 3D labels, highlighting the importance of this approach in real-world applications. Already open source.

Title: RenderOcc: Vision-Centric 3D Occupancy Prediction with 2DRendering Supervision

Author affiliation: Peking University, Xiaomi Automobile, Hong Kong Chinese MMLAB

The content that needs to be rewritten is: Open source address: GitHub - pmj110119/RenderOcc

3D occupancy prediction has important prospects in the fields of robot perception and autonomous driving, which quantifies 3D scenes into grid cells with semantic labels. Recent work mainly utilizes complete occupancy labels in 3D voxel space for supervision. However, expensive annotation processes and sometimes ambiguous labels severely limit the usability and scalability of 3D occupancy models. To solve this problem, the authors propose RenderOcc, a new paradigm for training 3D occupancy models using only 2D labels. Specifically, we extract NeRF-style 3D volumetric representations from multi-view images and use volume rendering techniques to build 2D reconstructions, enabling direct 3D supervision from 2D semantic and depth labels. In addition, the authors introduce an auxiliary ray method to solve the sparse viewpoint problem in autonomous driving scenes, which utilizes sequential frames to build a comprehensive 2D rendering for each target. RenderOcc is the first attempt to train a multi-view 3D occupancy model using only 2D labels, reducing the reliance on expensive 3D occupancy annotations. Extensive experiments show that RenderOcc performs comparably to fully supervised models using 3D labels, highlighting the importance of this approach in real-world applications.

Network structure:

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

Figure 1 shows a new training method for RenderOcc. Different from previous methods that rely on expensive 3D occupancy labels for supervision, the RenderOcc proposed in this paper utilizes 2D labels to train the 3D occupancy network. With 2D rendering supervision, the model is able to benefit from fine-grained 2D pixel-level semantics and depth supervision

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

Figure 2. Overall framework of RenderOcc. This paper extracts volumetric features through a 2D to 3D network and predicts the density and semantics of each voxel. Therefore, this paper generates a Semantic Density Field, which can perform volume rendering to generate rendered 2D semantics and depth. For the generation of Rays GT, this paper extracts auxiliary rays from adjacent frames to supplement the rays of the current frame and uses the proposed weighted ray sampling strategy to purify them. Then, this article uses ray GT and {} to calculate the loss to achieve rendering supervision of 2D labels

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

Rewritten content: Figure 3. Auxiliary light: A single frame image cannot capture the multi-view information of the object well. There is only a small overlap area between adjacent cameras and the difference in viewing angle is limited. By introducing auxiliary rays from adjacent frames, the model can significantly benefit from multi-view consistency constraints

Experimental results:

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels

The content that needs to be rewritten is: Original link: https://mp.weixin.qq.com/s/WzI8mGoIOTOdL8irXrbSPQ

The above is the detailed content of First article: A new paradigm for training multi-view 3D occupancy models using only 2D labels. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template