Table of Contents
Paper idea:
Network Design:
Home Technology peripherals AI UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

Sep 16, 2023 pm 08:29 PM
intelligent Vision

Original title: UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric and Semantic Rendering

Please click the following link to view the paper: https://arxiv.org/pdf/2306.09117.pdf

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

Paper idea:

In this technical report, we propose a solution called UniOCC for use in nuScenes at CVPR 2023 Vision-centered 3D occupancy prediction trajectories are performed in the Open Dataset Challenge. Existing occupancy prediction methods mainly focus on using 3D occupancy labels to optimize the projected characteristics of the 3D volumetric space. However, the generation process of these labels is very complex and expensive (relying on 3D semantic annotation), and is limited by voxel resolution and cannot provide fine-grained spatial semantics. To address this limitation, we propose a new unified occupancy (UniOcc) prediction method that explicitly imposes spatial geometric constraints and supplements fine-grained semantic supervision with volume ray rendering. Our method significantly improves model performance and shows good potential in reducing manual annotation costs. Considering the laboriousness of annotating 3D occupancies, we further propose the depth-aware Teacher Student (DTS) framework to improve the prediction accuracy using unlabeled data. Our solution achieved 51.27% mIoU on the official single-model ranking, ranking third in this challenge

Network Design:

Here As part of this challenge, this paper proposes UniOcc, a general solution that leverages volume rendering to unify 2D and 3D representation supervision, improving multi-camera occupancy prediction models. This paper does not design a new model architecture, but focuses on enhancing existing models [3, 18, 20] in a versatile and plug-and-play manner.

Re-written as follows: This paper implements the function of generating 2D semantic and depth maps using volume rendering by upgrading the representation to NeRF-style representation [1,15,21]. This enables fine-grained supervision at the 2D pixel level. By ray sampling three-dimensional voxels, the rendered two-dimensional pixel semantics and depth information can be obtained. By explicitly integrating geometric occlusion relationships and semantic consistency constraints, this paper provides explicit guidance for the model and ensures compliance with these constraints. It is worth mentioning that UniOcc has the potential to reduce the need for expensive 3D semantic annotation. dependence. In the absence of 3D occupancy labels, models trained using only our volume rendering supervision perform even better than models trained using 3D label supervision. This highlights the exciting potential to reduce reliance on expensive 3D semantic annotations, as scene representations can be learned directly from affordable 2D segmentation labels. In addition, using advanced technologies such as SAM [6] and [14,19] can further reduce the cost of 2D segmentation annotation.

This article also introduces the Deep Sensing Teacher-Student (DTS) framework, a self-supervised training method. Unlike the classic Mean Teacher, DTS enhances the deep prediction of the teacher model, achieving stable and effective training while utilizing unlabeled data. Furthermore, this paper applies some simple yet effective techniques to improve the performance of the model. This includes using visible masks in training, using a stronger pre-trained backbone network, increasing voxel resolution, and implementing test-time data augmentation (TTA)

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering! following Here is an overview of the UniOcc framework: Figure 1

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!Figure 2. Depth-aware Teacher-Student framework.

Experimental results:

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!#Quote:

Pan, M., Liu, L., Liu, J., Huang, P., Wang, L., Zhang, S., Xu, S., Lai, Z., Yang, K. (2023) . UniOcc: Unifying geometric and semantic rendering with vision-centric 3D occupancy prediction. ArXiv. / abs / 2306.09117

Original link: https://mp.weixin.qq.com/s/iLPHMtLzc5z0f4bg_W1vIgUniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

The above is the detailed content of UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Smart App Control on Windows 11: How to turn it on or off Smart App Control on Windows 11: How to turn it on or off Jun 06, 2023 pm 11:10 PM

Intelligent App Control is a very useful tool in Windows 11 that helps protect your PC from unauthorized apps that can damage your data, such as ransomware or spyware. This article explains what Smart App Control is, how it works, and how to turn it on or off in Windows 11. What is Smart App Control in Windows 11? Smart App Control (SAC) is a new security feature introduced in the Windows 1122H2 update. It works with Microsoft Defender or third-party antivirus software to block potentially unnecessary apps that can slow down your device, display unexpected ads, or perform other unexpected actions. Smart application

Beyond ORB-SLAM3! SL-SLAM: Low light, severe jitter and weak texture scenes are all handled Beyond ORB-SLAM3! SL-SLAM: Low light, severe jitter and weak texture scenes are all handled May 30, 2024 am 09:35 AM

Written previously, today we discuss how deep learning technology can improve the performance of vision-based SLAM (simultaneous localization and mapping) in complex environments. By combining deep feature extraction and depth matching methods, here we introduce a versatile hybrid visual SLAM system designed to improve adaptation in challenging scenarios such as low-light conditions, dynamic lighting, weakly textured areas, and severe jitter. sex. Our system supports multiple modes, including extended monocular, stereo, monocular-inertial, and stereo-inertial configurations. In addition, it also analyzes how to combine visual SLAM with deep learning methods to inspire other research. Through extensive experiments on public datasets and self-sampled data, we demonstrate the superiority of SL-SLAM in terms of positioning accuracy and tracking robustness.

The facial features are flying around, opening the mouth, staring, and raising eyebrows, AI can imitate them perfectly, making it impossible to prevent video scams The facial features are flying around, opening the mouth, staring, and raising eyebrows, AI can imitate them perfectly, making it impossible to prevent video scams Dec 14, 2023 pm 11:30 PM

With such a powerful AI imitation ability, it is really impossible to prevent it. It is completely impossible to prevent it. Has the development of AI reached this level now? Your front foot makes your facial features fly, and on your back foot, the exact same expression is reproduced. Staring, raising eyebrows, pouting, no matter how exaggerated the expression is, it is all imitated perfectly. Increase the difficulty, raise the eyebrows higher, open the eyes wider, and even the mouth shape is crooked, and the virtual character avatar can perfectly reproduce the expression. When you adjust the parameters on the left, the virtual avatar on the right will also change its movements accordingly to give a close-up of the mouth and eyes. The imitation cannot be said to be exactly the same, but the expression is exactly the same (far right). The research comes from institutions such as the Technical University of Munich, which proposes GaussianAvatars, which

The first pure visual static reconstruction of autonomous driving The first pure visual static reconstruction of autonomous driving Jun 02, 2024 pm 03:24 PM

A purely visual annotation solution mainly uses vision plus some data from GPS, IMU and wheel speed sensors for dynamic annotation. Of course, for mass production scenarios, it doesn’t have to be pure vision. Some mass-produced vehicles will have sensors like solid-state radar (AT128). If we create a data closed loop from the perspective of mass production and use all these sensors, we can effectively solve the problem of labeling dynamic objects. But there is no solid-state radar in our plan. Therefore, we will introduce this most common mass production labeling solution. The core of a purely visual annotation solution lies in high-precision pose reconstruction. We use the pose reconstruction scheme of Structure from Motion (SFM) to ensure reconstruction accuracy. But pass

What is NeRF? Is NeRF-based 3D reconstruction voxel-based? What is NeRF? Is NeRF-based 3D reconstruction voxel-based? Oct 16, 2023 am 11:33 AM

1 Introduction Neural Radiation Fields (NeRF) are a fairly new paradigm in the field of deep learning and computer vision. This technology was introduced in the ECCV2020 paper "NeRF: Representing Scenes as Neural Radiation Fields for View Synthesis" (which won the Best Paper Award) and has since become extremely popular, with nearly 800 citations to date [1 ]. The approach marks a sea change in the traditional way machine learning processes 3D data. Neural radiation field scene representation and differentiable rendering process: composite images by sampling 5D coordinates (position and viewing direction) along camera rays; feed these positions into an MLP to produce color and volumetric densities; and composite these values ​​using volumetric rendering techniques image; the rendering function is differentiable, so it can be passed

MotionLM: Language modeling technology for multi-agent motion prediction MotionLM: Language modeling technology for multi-agent motion prediction Oct 13, 2023 pm 12:09 PM

This article is reprinted with permission from the Autonomous Driving Heart public account. Please contact the source for reprinting. Original title: MotionLM: Multi-Agent Motion Forecasting as Language Modeling Paper link: https://arxiv.org/pdf/2309.16534.pdf Author affiliation: Waymo Conference: ICCV2023 Paper idea: For autonomous vehicle safety planning, reliably predict the future behavior of road agents is crucial. This study represents continuous trajectories as sequences of discrete motion tokens and treats multi-agent motion prediction as a language modeling task. The model we propose, MotionLM, has the following advantages: First

Take a look at the past and present of Occ and autonomous driving! The first review comprehensively summarizes the three major themes of feature enhancement/mass production deployment/efficient annotation. Take a look at the past and present of Occ and autonomous driving! The first review comprehensively summarizes the three major themes of feature enhancement/mass production deployment/efficient annotation. May 08, 2024 am 11:40 AM

Written above & The author’s personal understanding In recent years, autonomous driving has received increasing attention due to its potential in reducing driver burden and improving driving safety. Vision-based three-dimensional occupancy prediction is an emerging perception task suitable for cost-effective and comprehensive investigation of autonomous driving safety. Although many studies have demonstrated the superiority of 3D occupancy prediction tools compared to object-centered perception tasks, there are still reviews dedicated to this rapidly developing field. This paper first introduces the background of vision-based 3D occupancy prediction and discusses the challenges encountered in this task. Next, we comprehensively discuss the current status and development trends of current 3D occupancy prediction methods from three aspects: feature enhancement, deployment friendliness, and labeling efficiency. at last

Do you know that programmers will be in decline in a few years? Do you know that programmers will be in decline in a few years? Nov 08, 2023 am 11:17 AM

"ComputerWorld" magazine once wrote an article saying that "programming will disappear by 1960" because IBM developed a new language FORTRAN, which allows engineers to write the mathematical formulas they need and then submit them. Give the computer a run, so programming ends. A few years later, we heard a new saying: any business person can use business terms to describe their problems and tell the computer what to do. Using this programming language called COBOL, companies no longer need programmers. . Later, it is said that IBM developed a new programming language called RPG that allows employees to fill in forms and generate reports, so most of the company's programming needs can be completed through it.

See all articles