What, you don’t know NeRF yet?
As the hottest AI technology in the field of computer vision this year, NeRF can be said to be widely used and has a bright future.
The friends on Station B have used this technology in new ways.
So, what exactly is NeRF?
NeRF (Neural Radiance Fields) is a concept first proposed in the best paper at the 2020 ECCV conference. It pushes implicit expression to a new level, using only 2D posed images as supervision , which can represent complex three-dimensional scenes.
One stone stirred up a thousand waves. Since then, NeRF has developed rapidly and been applied to many technical directions, such as "new viewpoint synthesis, three-dimensional reconstruction", etc.
NeRF inputs sparse multi-angle images with poses for training to obtain a neural radiation field model. According to this model, clear photos from any viewing angle can be rendered, as shown in the figure below. It can also be briefly summarized as using an MLP to implicitly learn a three-dimensional scene.
#Netizens will naturally compare NeRF with the equally popular Deepfake.
A recent article published by MetaPhysics took stock of the evolutionary history, challenges and advantages of NeRF, and predicted that NeRF will eventually replace Deepfake.
Most of the eye-catching topics about deepfake technology refer to the two open source software packages that have become popular since deepfakes entered the public eye in 2017: DeepFaceLab (DFL) and FaceSwap.
While both packages have extensive user bases and active developer communities, neither project deviates significantly from the GitHub code.
Of course, the developers of DFL and FaceSwap have not been idle: it is now possible to train deepfake models using larger input images, although this requires more expensive GPUs.
#But in fact, in the past three years, the improvement in deepfake image quality promoted by the media is mainly due to end users.
They have accumulated "time-saving and rare" experience in data collection, as well as the best methods to train models (sometimes a single experiment can take weeks), and learned how to leverage and extend the original 2017 code. The outermost limit.
Some in the VFX and ML research communities are trying to break through the "hard limits" of the popular deepfake package by extending the architecture so that machine learning models can be trained on images up to 1024×1024.
The pixels are twice the current actual range of DeepFaceLab or FaceSwap, closer to the resolutions useful in film and television production.
Let’s learn about NeRF together~
NeRF (Neural Radiance Fields), which appeared in 2020, is a method that passes through the neural A method of reconstructing objects and environments by splicing photos from multiple viewpoints within the network.
It achieves the best results for synthesizing complex scene views by optimizing the underlying continuous volumetric scene function using a sparse set of input views.
The algorithm also uses a fully connected deep network to represent a scene, its input is a single continuous 5D coordinate (spatial position (x, y, z) and viewing direction (θ, φ)), and its output is the Volumetric density at a spatial location and associated emission amplitude brightness.
The view is synthesized by querying 5D coordinates along the camera ray, and using classic volume rendering techniques to project the output color and density into the image.
Implementation process:
First represent a continuous scene as a 5D vector value function, whose input is a 3D position and 2D viewing direction, corresponding to The output of is an emission color c and volume density σ.
In practice, the 3D Cartesian unit vector d is used to represent the direction. This continuous 5D scene representation is approximated with an MLP network and its weights are optimized.
Additionally, the representation is encouraged to be consistent across multiple views by restricting the network to predict volume density σ as a function of position x, while also allowing RGB color c to be predicted as a function of position and viewing direction.
To achieve this, the MLP first processes the input 3D coordinates x with 8 fully connected layers (using ReLU activation and 256 channels per layer), and outputs σ and 256-dimensional feature vectors.
This feature vector is then concatenated with the viewing direction of the camera ray and passed to an additional fully connected layer that outputs the view-dependent RGB color.
In addition, NeRF also introduces two improvements to achieve the representation of high-resolution complex scenes. The first is positional encoding to help MLP represent high-frequency functions, and the second is a stratified sampling process to enable it to efficiently sample high-frequency representations.
As we all know, the position encoding in the Transformer architecture can provide the discrete position of the mark in the sequence as the input of the entire architecture. NeRF uses position coding to map continuous input coordinates to a higher dimensional space, making it easier for MLP to approximate higher frequency functions.
As can be observed from the figure, removing positional encoding will greatly reduce the model's ability to represent high-frequency geometry and texture, ultimately leading to an over-smooth appearance.
Since the rendering strategy of densely evaluating the neural radiation field network at N query points along each camera ray is very inefficient, NeRF finally adopted a hierarchical representation, by proportioning the expected effect of the final rendering Allocate samples to improve rendering efficiency.
In short, NeRF no longer uses only one network to represent the scene, but optimizes two networks at the same time, a "coarse-grained" network and a "fine-grained" network.
NeRF solves the shortcomings of the past, that is, using MLP to represent objects and scenes as continuous functions. Compared with previous methods, NeRF can produce better rendering effects.
However, NeRF also faces many technical bottlenecks. For example, NeRF's accelerator will sacrifice other relatively useful functions (such as flexibility) to achieve low latency, more interactive environments, and less training time.
So, although NeRF is a key breakthrough, it still takes a certain amount of time to achieve perfect results.
Technology is progressing, and the future is still promising!
The above is the detailed content of Is it expected to replace Deepfake? Revealing how awesome this year's most popular NeRF technology is. For more information, please follow other related articles on the PHP Chinese website!