#The AIxiv column of our website is a column about academic and technical content. In the past few years, the AIxiv column on our website has received more than 2,000 pieces of content, covering top laboratories from major universities and companies around the world, helping to promote academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. The submission email address is liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com.
Researchers from Hong Kong University of Science and Technology and Tsinghua University proposed "GenN2N", a unified generative NeRF-to-NeRF conversion framework. Suitable for various NeRF conversion tasks, such as text-driven NeRF editing, coloring, super-resolution, repair, etc., with extremely excellent performance!
- Paper address: https://arxiv.org/abs/2404.02788
- Paper homepage: https://xiangyueliu.github.io/GenN2N/
-
Github address: https://github.com/Lxiangyue/GenN2N
- Paper title: GenN2N: Generative NeRF2NeRF Translation
In recent years, Neural Radiation Fields (NeRF) have become popular due to their compactness ,high quality and versatility have attracted widespread attention ,in the fields of 3D reconstruction, 3D generation and ,new perspective synthesis. However, once a NeRF scene is created, these methods often lack further control over the resulting geometry and appearance. Therefore, NeRF Editing has recently become a research focus worthy of attention. Current NeRF editing methods are usually task-specific, such as text-driven editing of NeRF, super-resolution, repair, and colorization. These methods require a large amount of task-specific domain knowledge. In the field of 2D image editing, it has become a trend to develop universal image-to-image conversion methods. For example, the 2D generative model Stable Difussion is used to support multi-functional image editing. Therefore, we propose universal NeRF editing utilizing underlying 2D generative models. A challenge that comes with this is the representation gap between NeRF and 2D images, especially since image editors often generate multiple inconsistent edits for different viewpoints. A recent text-based NeRF editing method, Instruct-NeRF2NeRF, explores this. It adopts the "rendering-editing-aggregation" process to gradually update the NeRF scene by gradually rendering multi-view images, editing these images, and aggregating the edited images into NeRF. However, this editing method, after a lot of optimization for specific editing needs, can only generate a specific editing result. If the user is not satisfied, iterative attempts need to be repeated. Therefore, we propose "GenN2N", a general NeRF-to-NeRF framework suitable for a variety of NeRF editing tasks. Its core lies in generating This method is used to describe the multi-solution nature of the editing process, so that it can easily generate a large number of editing results that meet the requirements for users to choose with the help of generative editing. In the core part of GenN2N, 1) the generative framework of 3D VAE-GAN is introduced, using VAE to represent the entire editing space to learn 2D editing with a set of inputs All possible 3D NeRF editing distributions corresponding to the image, and use GAN to provide reasonable supervision for different views of the editing NeRF to ensure the authenticity of the editing results. 2) Use contrastive learning to decouple the editing content and perspective to ensure the editing content between different perspectives. Consistency, 3) During inference, the user simply randomly samples multiple editing codes from the conditional generation model to generate various 3D editing results corresponding to the editing target.
Compared with SOTA methods for various NeRF editing tasks (ICCV2023 Oral, etc.), GenN2N is superior to existing methods in terms of editing quality, diversity, efficiency, etc.
We first perform 2D image editing, and then edit these 2D images Upgrade to 3D NeRF to achieve generative NeRF-to-NeRF conversion.
We use Latent Distill Module as the encoder of VAE to learn one for each edited image An implicit editing code that controls the generated content during NeRF-to-NeRF conversion. All editing codes obey a good normal distribution under the constraint of KL loss for better sampling. In order to decouple editing content and perspective, we carefully designed comparative learning to encourage the editing codes of pictures with the same editing style but different perspectives to be similar, and the editing codes of pictures with different editing styles but the same perspective to be far away from each other. B.NeRF-to-NeRF conversion (Translated NeRF) us NeRF-to-NeRF Translation is used as the decoder of VAE, which takes the editing code as input and modifies the original NeRF into a converted NeRF. We added residual layers between the hidden layers of the original NeRF network. These residual layers use the editing code as input to modulate the hidden layer neurons, so that the converted NeRF can not only retain the original NeRF information, but also control the 3D conversion based on the editing code. content. At the same time, NeRF-to-NeRF Translation also serves as a generator to participate in generative adversarial training. By generating rather than optimizing, we can obtain multiple conversion results at once, significantly improving NeRF conversion efficiency and result diversity. C. Conditional Discriminator ##Convert NeRF rendering image It constitutes a generation space that needs to be judged. The editing styles and rendering perspectives of these pictures are different, making the generation space very complex. Therefore we provide a condition as additional information for the discriminator. Specifically, when the discriminator identifies the generator's rendered picture (negative sample) or the edited picture (positive sample) in the training data, we select an edited picture of the same perspective from the training data Picture is used as a condition, which prevents the discriminator from being interfered by perspective factors when distinguishing positive and negative samples. After GenN2N optimization, users can Randomly sample the editing code from the normal distribution, and input the converted NeRF to generate an edited high-quality, multi-view consistent 3D NeRF scene. We conducted on various NeRF-to-NeRF tasks Extensive experiments including NeRF text-driven editing, colorization, super-resolution, inpainting, and more. Experimental results demonstrate GenN2N’s superior editing quality, multi-view consistency, generated diversity, and editing efficiency. A. Text-based NeRF editingB.NeRF coloring C.NeRF Super Resolution D.NeRF Repair Our method is qualitatively and quantitatively compared with SOTA methods for various specific NeRF tasks (including text-driven editing, coloring , super-resolution and restoration, etc.). The results show that GenN2N, as a general framework, performs as well as or better than task-specific SOTA, while the editing results have greater diversity (the following is a comparison between GenN2N and Instruct-NeRF2NeRF on the text-based NeRF editing task).A. Text-based NeRF EditorLearn more about experiments and methods , please refer to the paper homepage. This paper comes from the Tan Ping team of Hong Kong University of Science and Technology and Tsinghua University 3DVICI Lab, Shanghai Artificial Intelligence Laboratory and Shanghai Qizhi Research Institute. The authors of the paper are Liu Xiangyue, a student of Hong Kong University of Science and Technology, Xue Han, a student of Tsinghua University, and Luo Kunming, a student of Hong Kong University of Science and Technology. The instructors are Professor Yi Li of Tsinghua University and Hong Kong Science and Technology Teacher Tan Ping from the university.
The above is the detailed content of CVPR 2024 high-scoring paper: New generative editing framework GenN2N, unifying NeRF conversion tasks. For more information, please follow other related articles on the PHP Chinese website!