Pictures
Paper link:https://arxiv.org/pdf/2310.19629
Code link:https://github.com/vLAR-group/RayDF
Homepage: The content that needs to be rewritten is: https://vlar-group.github.io/RayDF.html
Rewritten content: Implementation method:
The overall process and components of RayDF are as follows (see Figure 1)
In the machine Learning accurate and efficient 3D shape representation is very important in many cutting-edge applications in the fields of vision and robotics. However, existing implicit expressions based on 3D coordinates require expensive computational costs when representing 3D shapes or rendering 2D images; in contrast, ray-based methods can efficiently infer 3D shapes. However, existing ray-based methods do not take into account the geometric consistency under multiple viewing angles, making it difficult to recover accurate geometric shapes under unknown viewing angles.
To address these problems, this paper proposes a new maintenance method. RayDF, a ray-based implicit expression method for multi-view geometric consistency. This method is based on a simple ray-surface distance field, by introducing a new dual-ray visibility classifier and a multi-view consistency optimization module. optimization module), learn to obtain a ray-surface distance that satisfies the geometric consistency of multiple viewing angles. Experimental results show that the modified method achieves superior 3D surface reconstruction performance on three data sets and achieves a rendering speed 1000 times faster than the coordinate-based method (see Table 1).
The following are the main contributions:
(1) First construct the ray pairs for training for the auxiliary network dual-ray visibility classifier. For a ray in a picture (corresponding to a pixel in the picture), the corresponding space surface point can be known through its ray-surface distance. Project it to the remaining viewing angles in the training set to obtain another ray; and this ray There is a corresponding ray-surface distance. The article sets a threshold of 10 mm to determine whether two rays are visible to each other.
(2) The second stage is to train the main network ray-surface distance network to make its predicted distance field meet multi-view consistency. As shown in Figure 4, for a main ray and its surface points, the surface point is uniformly sampled with the center of the sphere to obtain several multi-view rays. Pair the main ray with these multi-view rays one by one, and their mutual visibility can be obtained through the trained dual-ray visibility classifier. Then predict the ray-surface distance of these rays through the ray-surface distance network; if the main ray and a certain sampling ray are mutually visible, then the surface points calculated by the ray-surface distances of the two rays should be the same point; according to The corresponding loss function is designed and the main network is trained, which ultimately enables the ray-surface distance field to meet multi-view consistency.
Since the depth value at the edge of the scene surface often has mutations (discontinuity), and neural The network is a continuous function. The above-mentioned ray-surface distance field can easily predict inaccurate distance values at the edge of the surface, resulting in noise on the geometric surface at the edge. Fortunately, the designed ray-surface distance field has a good feature, as shown in Figure 5. The normal vector of each estimated three-dimensional surface point can be easily found in closed form through automatic differentiation of the network. Therefore, the normal vector Euclidean distance of the surface point can be calculated during the network inference stage. If the distance value is greater than the threshold, the surface point is regarded as an outlier and eliminated, thereby obtaining a clean three-dimensional reconstructed surface.
Figure 5 Surface normal calculation
In order to verify the effectiveness of the proposed method, we performed experiments on three data sets Experiments were conducted on. The three data sets are the object-level synthetic data set Blender [1], the scene-level synthetic data set DM-SR [2], and the scene-level real data set ScanNet [3]. We selected seven baselines for performance comparison. Among them, OF [4]/DeepSDF [5]/NDF [6]/NeuS [7] are coordinate-based level-set methods, DS-NeRF [8] is a depth-supervised NeRF-based method, and LFN [9] and PRIF [10] are two ray-based baselines
Due to the ease of the RayDF method to directly add a radiance branch to learn textures, it can be compared with baseline models that support predicting radiance fields. Therefore, the comparative experiments of this paper are divided into two groups. The first group (Group 1) only predicts distance (geometry), and the second group (Group 2) predicts both distance and radiance (geometry and texture)
As can be seen from Table 2 and Figure 6, in Group 1 and 2, RayDF achieved better results in surface reconstruction, especially in the most important ADE indicator. Better than coordinate- and ray-based baselines. At the same time, in terms of radiance field rendering, RayDF also achieved performance comparable to DS-NeRF and better than LFN and PRIF.
Figure 6 Visual comparison of Blender data set
As can be seen from Table 3, RayDF surpasses all baselines in the most critical ADE indicator. At the same time, in the Group 2 experiment, RayDF was able to obtain high-quality new view synthesis while ensuring that the accurate surface shape was restored (see Figure 7).
Figure 7 Visual comparison of DM-SR data set
Table 4 compares the performance of RayDF and baselines in challenging real-world scenarios. In the first and second groups, RayDF significantly outperforms baselines in almost all evaluation metrics, showing clear advantages in recovering complex real-world 3D scenes
The following is a rewrite of the visual comparison of Figure 8 ScanNet dataset: In Figure 8, we show the visual comparison results of the ScanNet dataset
We conducted an ablation experiment on the Blender dataset. Table 5 in the paper shows the key The ablation experimental results of the dual-ray visibility classifier
Other resection operations can be viewed in the paper and the paper appendix
need to be re- The written content is: Figure 9 shows the visual comparison of using a classifier and not using a classifier
When using the ray-based multi-view consistency framework for research, the paper A conclusion is drawn that three-dimensional shape representations can be learned efficiently and accurately through this method. In the paper, a simple ray-surface distance field is used to represent the geometry of three-dimensional shapes, and a novel dual-ray visibility classifier is used to further achieve multi-view geometric consistency. Experiments on multiple data sets have proven that the RayDF method has extremely high rendering efficiency and excellent performance. Further extensions to the RayDF framework are welcome. You can view more visualization results on the homepage
The content that needs to be rewritten is: https://vlar-group.github.io/RayDF.html
The content that needs to be rewritten is: Original link: https://mp.weixin.qq.com/s/dsrSHKT4NfgdDPYcKOhcOA
The above is the detailed content of New title: Real-time rendering evolved! Innovative method of 3D reconstruction based on rays. For more information, please follow other related articles on the PHP Chinese website!