Home Technology peripherals AI CVPR 2024 high-scoring paper: New generative editing framework GenN2N, unifying NeRF conversion tasks

CVPR 2024 high-scoring paper: New generative editing framework GenN2N, unifying NeRF conversion tasks

Apr 19, 2024 pm 09:40 PM
git project genn2n

CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

#The AIxiv column of our website is a column about academic and technical content. In the past few years, the AIxiv column on our website has received more than 2,000 pieces of content, covering top laboratories from major universities and companies around the world, helping to promote academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. The submission email address is liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com.


Researchers from Hong Kong University of Science and Technology and Tsinghua University proposed "GenN2N", a unified generative NeRF-to-NeRF conversion framework. Suitable for various NeRF conversion tasks, such as text-driven NeRF editing, coloring, super-resolution, repair, etc., with extremely excellent performance! CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

  • Paper address: https://arxiv.org/abs/2404.02788
  • Paper homepage: https://xiangyueliu.github.io/GenN2N/
  • Github address: https://github.com/Lxiangyue/GenN2N
  • Paper title: GenN2N: Generative NeRF2NeRF Translation

In recent years, Neural Radiation Fields (NeRF) have become popular due to their compactness ,high quality and versatility have attracted widespread attention ,in the fields of 3D reconstruction, 3D generation and ,new perspective synthesis. However, once a NeRF scene is created, these methods often lack further control over the resulting geometry and appearance. Therefore, NeRF Editing has recently become a research focus worthy of attention.

Current NeRF editing methods are usually task-specific, such as text-driven editing of NeRF, super-resolution, repair, and colorization. These methods require a large amount of task-specific domain knowledge. In the field of 2D image editing, it has become a trend to develop universal image-to-image conversion methods. For example, the 2D generative model Stable Difussion is used to support multi-functional image editing. Therefore, we propose universal NeRF editing utilizing underlying 2D generative models.

A challenge that comes with this is the representation gap between NeRF and 2D images, especially since image editors often generate multiple inconsistent edits for different viewpoints. A recent text-based NeRF editing method, Instruct-NeRF2NeRF, explores this. It adopts the "rendering-editing-aggregation" process to gradually update the NeRF scene by gradually rendering multi-view images, editing these images, and aggregating the edited images into NeRF. However, this editing method, after a lot of optimization for specific editing needs, can only generate a specific editing result. If the user is not satisfied, iterative attempts need to be repeated.

Therefore, we propose "GenN2N", a general NeRF-to-NeRF framework suitable for a variety of NeRF editing tasks. Its core lies in generating This method is used to describe the multi-solution nature of the editing process, so that it can easily generate a large number of editing results that meet the requirements for users to choose with the help of generative editing.

In the core part of GenN2N, 1) the generative framework of 3D VAE-GAN is introduced, using VAE to represent the entire editing space to learn 2D editing with a set of inputs All possible 3D NeRF editing distributions corresponding to the image, and use GAN to provide reasonable supervision for different views of the editing NeRF to ensure the authenticity of the editing results. 2) Use contrastive learning to decouple the editing content and perspective to ensure the editing content between different perspectives. Consistency, 3) During inference, the user simply randomly samples multiple editing codes from the conditional generation model to generate various 3D editing results corresponding to the editing target.

Compared with SOTA methods for various NeRF editing tasks (ICCV2023 Oral, etc.), GenN2N is superior to existing methods in terms of editing quality, diversity, efficiency, etc.

Method introduction

We first perform 2D image editing, and then edit these 2D images Upgrade to 3D NeRF to achieve generative NeRF-to-NeRF conversion.

CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

A. Latent Distill

We use Latent Distill Module as the encoder of VAE to learn one for each edited image An implicit editing code that controls the generated content during NeRF-to-NeRF conversion. All editing codes obey a good normal distribution under the constraint of KL loss for better sampling. In order to decouple editing content and perspective, we carefully designed comparative learning to encourage the editing codes of pictures with the same editing style but different perspectives to be similar, and the editing codes of pictures with different editing styles but the same perspective to be far away from each other.

B.NeRF-to-NeRF conversion (Translated NeRF)

us NeRF-to-NeRF Translation is used as the decoder of VAE, which takes the editing code as input and modifies the original NeRF into a converted NeRF. We added residual layers between the hidden layers of the original NeRF network. These residual layers use the editing code as input to modulate the hidden layer neurons, so that the converted NeRF can not only retain the original NeRF information, but also control the 3D conversion based on the editing code. content. At the same time, NeRF-to-NeRF Translation also serves as a generator to participate in generative adversarial training. By generating rather than optimizing, we can obtain multiple conversion results at once, significantly improving NeRF conversion efficiency and result diversity.

C. Conditional Discriminator

##Convert NeRF rendering image It constitutes a generation space that needs to be judged. The editing styles and rendering perspectives of these pictures are different, making the generation space very complex. Therefore we provide a condition as additional information for the discriminator. Specifically, when the discriminator identifies the generator's rendered picture
(negative sample) or the edited picture CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务 (positive sample) in the training data, we select an edited picture of the same perspective from the training data Picture CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务 is used as a condition, which prevents the discriminator from being interfered by perspective factors when distinguishing positive and negative samples. CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

D. Inference

After GenN2N optimization, users can Randomly sample the editing code from the normal distribution, and input the converted NeRF to generate an edited high-quality, multi-view consistent 3D NeRF scene.

Experiments

We conducted on various NeRF-to-NeRF tasks Extensive experiments including NeRF text-driven editing, colorization, super-resolution, inpainting, and more. Experimental results demonstrate GenN2N’s superior editing quality, multi-view consistency, generated diversity, and editing efficiency.

A. Text-based NeRF editingCVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务B.NeRF coloring CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务C.NeRF Super Resolution CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务D.NeRF Repair CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务
Comparative experiments

Our method is qualitatively and quantitatively compared with SOTA methods for various specific NeRF tasks (including text-driven editing, coloring , super-resolution and restoration, etc.). The results show that GenN2N, as a general framework, performs as well as or better than task-specific SOTA, while the editing results have greater diversity (the following is a comparison between GenN2N and Instruct-NeRF2NeRF on the text-based NeRF editing task).

A. Text-based NeRF EditorCVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务
Learn more about experiments and methods , please refer to the paper homepage.

Team introduction

This paper comes from the Tan Ping team of Hong Kong University of Science and Technology and Tsinghua University 3DVICI Lab, Shanghai Artificial Intelligence Laboratory and Shanghai Qizhi Research Institute. The authors of the paper are Liu Xiangyue, a student of Hong Kong University of Science and Technology, Xue Han, a student of Tsinghua University, and Luo Kunming, a student of Hong Kong University of Science and Technology. The instructors are Professor Yi Li of Tsinghua University and Hong Kong Science and Technology Teacher Tan Ping from the university.

The above is the detailed content of CVPR 2024 high-scoring paper: New generative editing framework GenN2N, unifying NeRF conversion tasks. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to install deepseek How to install deepseek Feb 19, 2025 pm 05:48 PM

There are many ways to install DeepSeek, including: compile from source (for experienced developers) using precompiled packages (for Windows users) using Docker containers (for most convenient, no need to worry about compatibility) No matter which method you choose, Please read the official documents carefully and prepare them fully to avoid unnecessary trouble.

Summary of FAQs for DeepSeek usage Summary of FAQs for DeepSeek usage Feb 19, 2025 pm 03:45 PM

DeepSeekAI Tool User Guide and FAQ DeepSeek is a powerful AI intelligent tool. This article will answer some common usage questions to help you get started quickly. FAQ: The difference between different access methods: There is no difference in function between web version, App version and API calls, and App is just a wrapper for web version. The local deployment uses a distillation model, which is slightly inferior to the full version of DeepSeek-R1, but the 32-bit model theoretically has 90% full version capability. What is a tavern? SillyTavern is a front-end interface that requires calling the AI ​​model through API or Ollama. What is breaking limit

What are the AI ​​tools? What are the AI ​​tools? Nov 29, 2024 am 11:11 AM

AI tools include: Doubao, ChatGPT, Gemini, BlenderBot, etc.

What are the Grayscale Encryption Trust Funds? Common Grayscale Encryption Trust Funds Inventory What are the Grayscale Encryption Trust Funds? Common Grayscale Encryption Trust Funds Inventory Mar 05, 2025 pm 12:33 PM

Grayscale Investment: The channel for institutional investors to enter the cryptocurrency market. Grayscale Investment Company provides digital currency investment services to institutions and investors. It allows investors to indirectly participate in cryptocurrency investment through the form of trust funds. The company has launched several crypto trusts, which has attracted widespread market attention, but the impact of these funds on token prices varies significantly. This article will introduce in detail some of Grayscale's major crypto trust funds. Grayscale Major Crypto Trust Funds Available at a glance Grayscale Investment (founded by DigitalCurrencyGroup in 2013) manages a variety of crypto asset trust funds, providing institutional investors and high-net-worth individuals with compliant investment channels. Its main funds include: Zcash (ZEC), SOL,

Delphi Digital: How to change the new AI economy by parsing the new ElizaOS v2 architecture? Delphi Digital: How to change the new AI economy by parsing the new ElizaOS v2 architecture? Mar 04, 2025 pm 07:00 PM

ElizaOSv2: Empowering AI and leading the new economy of Web3. AI is evolving from auxiliary tools to independent entities. ElizaOSv2 plays a key role in it, which gives AI the ability to manage funds and operate Web3 businesses. This article will dive into the key innovations of ElizaOSv2 and how it shapes an AI-driven future economy. AI Automation: Going to independently operate ElizaOS was originally an AI framework focusing on Web3 automation. v1 version allows AI to interact with smart contracts and blockchain data, while v2 version achieves significant performance improvements. Instead of just executing simple instructions, AI can independently manage workflows, operate business and develop financial strategies. Architecture upgrade: Enhanced A

As top market makers enter the crypto market, what impact will Castle Securities have on the industry? As top market makers enter the crypto market, what impact will Castle Securities have on the industry? Mar 04, 2025 pm 08:03 PM

The entry of top market maker Castle Securities into Bitcoin market maker is a symbol of the maturity of the Bitcoin market and a key step for traditional financial forces to compete for future asset pricing power. At the same time, for retail investors, it may mean the gradual weakening of their voice. On February 25, according to Bloomberg, Citadel Securities is seeking to become a liquidity provider for cryptocurrencies. The company aims to join the list of market makers on various exchanges, including exchanges operated by CoinbaseGlobal, BinanceHoldings and Crypto.com, people familiar with the matter said. Once approved by the exchange, the company initially planned to set up a market maker team outside the United States. This move is not only a sign

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models Mar 12, 2025 pm 01:03 PM

Researchers from Shanghai Jiaotong University, Shanghai AILab and the Chinese University of Hong Kong have launched the Visual-RFT (Visual Enhancement Fine Tuning) open source project, which requires only a small amount of data to significantly improve the performance of visual language big model (LVLM). Visual-RFT cleverly combines DeepSeek-R1's rule-based reinforcement learning approach with OpenAI's reinforcement fine-tuning (RFT) paradigm, successfully extending this approach from the text field to the visual field. By designing corresponding rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT overcomes the limitations of the DeepSeek-R1 method being limited to text, mathematical reasoning and other fields, providing a new way for LVLM training. Vis

Bitwise: Businesses Buy Bitcoin A Neglected Big Trend Bitwise: Businesses Buy Bitcoin A Neglected Big Trend Mar 05, 2025 pm 02:42 PM

Weekly Observation: Businesses Hoarding Bitcoin – A Brewing Change I often point out some overlooked market trends in weekly memos. MicroStrategy's move is a stark example. Many people may say, "MicroStrategy and MichaelSaylor are already well-known, what are you going to pay attention to?" This is true, but many investors regard it as a special case and ignore the deeper market forces behind it. This view is one-sided. In-depth research on the adoption of Bitcoin as a reserve asset in recent months shows that this is not an isolated case, but a major trend that is emerging. I predict that in the next 12-18 months, hundreds of companies will follow suit and buy large quantities of Bitcoin

See all articles