Brainstorming new perspectives, a unified NeRF code base framework has been open sourced-AI-php.cn

Table of Contents

The core features of XRNeRF

Home

Technology peripherals

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 15, 2023 pm 08:58 PM

Model code

Suppose you look at a few photos of an object, can you imagine how it looks from other angles? People can do it. We can guess what the parts we haven't seen before, or what the angles we haven't seen look like. In fact, the model has a way to do this. Given some scene pictures, it can also brainstorm images from unseen angles.

Rendering a new perspective, the most eye-catching recently is NeRF (Neural Radiance Field), which was nominated for the ECCV 2020 Best Paper honorary mention. It does not require the previous complicated three-dimensional reconstruction process, only It takes just a few photos and the position of the camera when the photo was taken to synthesize an image from a new perspective. The stunning effects of NeRF attracted many visual researchers, and a series of excellent works were subsequently produced.

But the difficulty is that such models are complex to build, and there is currently no unified code base framework to implement them, which will undoubtedly hinder further exploration and development in this field. . To this end, the OpenXRLab rendering generation platform has built a highly modular algorithm library XRNeRF to help quickly realize the construction, training and inference of NeRF-like models.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

##Open source address: https://github.com/openxrlab/xrnerf

What is a NeRF class model

NeRF class tasks generally refer to capturing scene information under a known perspective, including the captured images, and the internal and external parameters corresponding to each image, thereby synthesizing new Image from perspective. We can understand this task very clearly with the help of the diagram in the NeRF paper.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

Selected from arxiv: 2003.08934.

#NeRF will also collect 5-dimensional scene information when collecting images, that is, one image corresponds to a three-dimensional coordinate value and two other light radiation angles. Such a scene will be modeled as a Radiance Field through a multi-layer perceptron, which means that the multi-layer perceptron will input a three-dimensional coordinate point and map it to the Density and RGB color of the point, thereby using voxel rendering (Volume Rendering) to convert the Radiance Field is rendered into a photorealistic virtual perspective.

As shown in the picture above, after constructing the Radiance Field through some pictures, the image of the drum set from a new perspective can be generated. Because NeRF does not require explicit 3D reconstruction to obtain the desired new perspective, it provides a 3D implicit representation paradigm based on deep learning, which can train 3D scenes using only 2D posed images data. Deep neural networks for information.

Since NeRF, NeRF-like models have emerged in endlessly: Mip-NeRF uses cones instead of rays to optimize the generation of fine structures; KiloNeRF uses thousands of micro-multilayers Perceptron instead of a single large multi-layer perceptron, reducing the amount of calculation and achieving real-time rendering capabilities; in addition, models such as AniNeRF and Neural Body learn human perspective transformation from short video frames to obtain good perspective synthesis and driving effects; in addition, GN The 'R model uses sparse perspective images and geometric priors to achieve generalizable human rendering between different IDs.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

The generalizable human body implicit field representation proposed by GN'R achieves the single-model human body rendering effect

Putting wheels on NeRF

Although the current NeRF algorithm is very popular in the research field, it is a relatively new method after all, so the model implementation must be A bit more troublesome. If you are using a conventional framework such as PyTorch or TensorFlow, you must first find a similar NeRF model and then modify it based on it.

Doing this will bring about several obvious problems. First of all, we have to completely understand an implementation before we can change it to what we want. The workload of this part is actually quite large. ; Secondly, because the official implementations of different papers are not unified, it will consume a lot of energy when comparing the source code of different NeRF models. After all, no one knows whether there are some novel tricks in the training process of a certain paper; finally, if there is no With a unified set of code, it will undoubtedly be much slower to verify new ideas for new models.

In order to solve many problems, OpenXRLab constructs a unified and highly modular code base framework XRNeRF for the NeRF class model.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

XRNeRF implements many NeRF models, which makes it easier to get started and can easily reproduce the experimental results of the corresponding papers. XRNeRF divides these models into five modules: datasets, mlp, network, embedder and render. The ease of use of XRNeRF lies in the fact that different modules can be assembled to form a complete model through the config mechanism. It is extremely simple and easy to use, and also greatly increases the reusability.

On the basis of ensuring ease of use, flexibility is also required. XRNeRF can customize the specific characteristics or implementation of different modules through another set of register mechanisms, thus making XRNeRF There is greater decoupling and the code is easier to understand.

In addition, all algorithms implemented by XRNeRF adopt the Pipeline mode. The Pipeline on the data reads the original data and obtains the input of the model after a series of processing. The Pipeline of the model then The input data is processed and the corresponding output is obtained. Such a Pipeline connects the config mechanism and the registration machine mechanism to form a complete architecture.

XRNeRF implements many core NeRF models and strings them together through the above three mechanisms to build a highly modular code framework that is both easy to use and flexible.

The core features of XRNeRF

XRNeRF is a NeRF class algorithm library based on the Pytorch framework. It has reproduced 8 classic papers in both scene and body directions. Compared with direct modeling, XRNeRF has significantly improved model building efficiency, cost and flexibility, and has complete usage documentation, examples and issue feedback mechanisms. In summary, the core features of XRNeRF have the following five points.

1. Implemented many mainstream and core algorithms

For example, the pioneering work NeRF, CVPR 2021 Best Paper Candidate (NeuralBody), ICCV 2021 Best Paper Honorable Mention (Mip-NeRF) and Siggraph 2022 Best Paper (Instant NGP).

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

On the basis of implementing these models, XRNeRF can also ensure that the reproduction effect is basically consistent with that in the paper. As shown in the figure below, judging from the objective PSNR and SSIM indicators, it can well reproduce the effect of the original code.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

2. Modular design

XRNeRF modularizes the entire code framework to maximize code reusability and facilitate researchers to read and modify existing codes. By analyzing the existing NeRF class model methods, the specific module process of XRNeRF design is shown in the following figure:

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

# #The advantage of modularity is that if we need to modify the data format, we only need to modify the logic under the Dataset module. If we need to modify the logic of rendering images, we only need to modify the Render model module.

3. Standard data processing pipeline

XRNeRF provides a set of complex and diverse problems in data preprocessing for NeRF algorithms. Standard data processing procedures. It is obtained serially from multiple data processing operations. You only need to modify the data pipeline part in the config configuration file to complete the smooth construction of data processing.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

NeRF configuration data flow section.

The data processing ops required for multiple data sets have been implemented in XRNeRF. You only need to define these ops in order in the config to complete the data processing process. of construction. If a new op needs to be added in the future, you only need to complete the implementation of the new op in the corresponding folder, and one line of code can be added to the entire data processing process.

4. Modular network construction method

##The model in XRNeRF mainly consists of embedder, MLP and render model Composed and connected through a network, these can be decoupled from each other, thus enabling the replacement of different modules between different algorithms.

The embedder inputs the position and perspective of the point and outputs the embedded feature data; the MLP uses the output of the embedder as input and outputs the Density and RGB color of the sampling point; the render model inputs the output of the MLP As a result, operations such as integration are performed along the points on the ray to obtain the RGB value of a pixel on the image. These three modules are connected through the standard network module to form a complete model.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

Customize the code structure of the network module.

5. Good reproduction effect

Supports training network in the fastest 60 seconds, 30 Real-time rendering of frames per second, supporting high-definition, anti-aliasing, multi-scale scene and human body image rendering. Whether looking at the objective PSNR and SSIM indicators or the subjective demo display effect, XRNeRF can well reproduce the effect of the original code.

Usage of XRNeRF

The XRNeRF framework seems to have very good features, and it is also very simple and convenient to use. For example, during the installation process, XRNeRF relies on many development environments, such as PyTorch, CUDA environment, visual processing libraries, etc. However, XRNeRF provides a Docker environment, and image files can be built directly through DockerFile.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

#We tried it. Compared with configuring various operating environments and packages step by step, the configuration method of only docker build one-line command is obviously more convenient. Too much. In addition, when building a Docker image, the domestic image address is configured in the DockerFile, so the speed is still very fast, and there is basically no need to worry about network problems.

After building the image and starting the container from the image, we can transfer the project code and data to the container through the docker cp command. However, you can also directly map the project address to the inside of the container through the -v parameter when creating the container. However, it should be noted here that the data set needs to be placed in a certain location (otherwise the config file needs to be changed), such as the data folder under the XRNeRF project.

Generally speaking, after downloading the data, the approximate folder structure is as shown below:

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

Now, the environment, data and code are all ready. With just one short line of code, you can perform the training and verification of the NeFR model:

python run_nerf.py --config configs/nerf/nerf_blender_base01.py --dataname lego

Copy after login

Where dataname represents the specific data set in the data directory, and config represents the specific configuration file of the model. Because XRNeRF adopts a highly modular design, its config is built using a dictionary. Although it may seem a little cumbersome at first glance, after actually understanding the design structure of XRNeRF, it is very simple to read.

From a subjective point of view, the config configuration file (nerf_blender_base01.py) contains all necessary information for training the model, including optimizers, distributed strategies, model architecture, data preprocessing and iteration, etc. , and even many image processing related configurations are included. In summary, in addition to specific code implementation, the config configuration file describes the entire training and inference process.

Brainstorming new perspectives, a unified NeRF code base framework has been open sourced

Describes the Config configuration of the model structure part.

Overall experience, XRNeRF is relatively smooth from the basic operating environment establishment to the final execution of training tasks. Moreover, by configuring the config file or implementing specific OPs, you can also obtain very high modeling flexibility. Compared with directly using deep learning framework modeling, XRNeRF will undoubtedly reduce a lot of development work, and researchers or algorithm engineers can also spend more time on model or task innovation.

The NeRF class model is still the focus of research in the field of computer vision. A unified code base like XRNeRF, like the Transformer library of HuggingFace, can gather more and more excellent research work. Gathering more and more new code and new ideas. In turn, XRNeRF will also greatly accelerate researchers' exploration of NeRF-type models, making it easier to apply this new field to new scenarios and tasks, and the potential of NeRF will also be accelerated.

The above is the detailed content of Brainstorming new perspectives, a unified NeRF code base framework has been open sourced. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

4 weeks ago By DDD

Atomfall guide: item locations, quest guides, and tips

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7652

CakePHP Tutorial

1393

What is the format of the account name of steam

win11 activation key permanent

nyt mini crossword answers

110

Related knowledge

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

$The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24)$ The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24) Apr 23, 2024 pm 01:20 PM

Project link written in front: https://nianticlabs.github.io/mickey/ Given two pictures, the camera pose between them can be estimated by establishing the correspondence between the pictures. Typically, these correspondences are 2D to 2D, and our estimated poses are scale-indeterminate. Some applications, such as instant augmented reality anytime, anywhere, require pose estimation of scale metrics, so they rely on external depth estimators to recover scale. This paper proposes MicKey, a keypoint matching process capable of predicting metric correspondences in 3D camera space. By learning 3D coordinate matching across images, we are able to infer metric relative

See all articles