The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com
## The authors of this paper include Yang Runyi, a master’s student at Imperial College , Zhu Zhenxin, a second-year master's student at Beihang University, Jiang Zhou, a second-year master's student at Beijing Institute of Technology, Ye Baijun, a fourth-year undergraduate student at Beijing Institute of Technology, Zhang Yifei, a third-year undergraduate student at the Chinese Academy of Sciences, and Multimedia Cognition at China Telecom Artificial Intelligence Research Institute Zhao Jian, head of the Learning Laboratory (EVOL Lab), Zhao Hao, assistant professor of Tsinghua University Intelligent Industry Research Institute (AIR), etc.. Recently, 3D Gaussian Splatting (3DGS), as a novel 3D representation method, has attracted attention due to its fast rendering speed and high rendering quality. However, this approach also comes with high memory consumption, for example, a trained Gaussian field may use more than three million Gaussian primitives and more than 700 MB of memory.
Recently, Imperial College London, Beihang University, Beijing Institute of Technology, University of Chinese Academy of Sciences, China Telecom Artificial Intelligence Research Institute Multimedia Cognitive Learning Laboratory (EVOL Lab), Tsinghua University Researchers from the Intelligent Industry Research Institute (AIR) and other institutions jointly published a paper "SUNDAE: Spectrally Pruned Gaussian Fields with Neural Compensation". We believe that this high memory usage is due to the failure to consider the relationship between primitives. In the paper, we propose a
memory-efficient Gaussian field named SUNDAE using spectral pruning and neural compensation.
Article link: https://arxiv.org/abs/2405.00676
-
Project homepage: https://runyiyang.github.io/projects/SUNDAE/
On the one hand, we construct a graph based on the spatial information of Gaussian primitives , is used to simulate the relationship between them, and a down-sampling module based on graph signal processing is designed to prune while retaining the desired signal. On the other hand, to compensate for the quality degradation caused by pruning, we utilize a lightweight neural network to blend rendering features, effectively compensating for the quality degradation while capturing the relationships between primitives in their weights.
We demonstrate the performance of SUNDAE through extensive results. For example, on the Mip-NeRF360 dataset, SUNDAE can achieve 26.80 PSNR and 145 FPS using 104 MB of memory, while the standard 3D Gaussian Splatting algorithm achieves 25.60 PSNR and 160 FPS using 523 MB of memory.
At the same time, since being open source, SUNDAE has received widespread international attention, including the well-known NeRF community MrNeRF, AI research community maintainer Ahsen Khaliq, and many researchers in related fields. Retweet and follow.
1. Spectral pruned Gaussian field with neural compensation
1.1 Spectrogram-based pruning strategy##3DGS uses a set of Gaussian primitives to represent the scene. Since these primitives With irregular distributions in three-dimensional space, we propose a graph-based approach to capture the relationships between primitives instead of using conventional structures like grids.
Specifically, we adopt graph signal processing theory to derive an optimal sampling strategy that can retain specific spectrum information based on graph signals. By controlling the spectrum bandwidth, we can flexibly control the pruning ratio and model the relationship between Gaussian primitives. As shown in Figure 1(c), we can control pruning 90% of the Gaussian primitives without degrading the rendering quality.
Figure 1: (a) The result of 3DGS 7k iterations; (b) The result of 3DGS 30k iterations. More Gaussian primitives are used to represent the three-dimensional scene, so the quality is higher. Slower speed and larger storage space; (c) 90% of the Gaussian primitives are pruned, which greatly reduces the storage space, but achieves similar rendering effects. We use the center of the Gaussian primitives as the signal input on the graph, and the distance between the Gaussian primitives as the edges of the graph,Fig. The adjacency matrix can be expressed as
where is the center point of the Gaussian element, is a threshold super parameter , is the variance of the distance matrix. That is, if the distance between two Gaussian primitives is smaller than a threshold, then we will establish a graph edge between them. After establishing the adjacency matrix of the graph, we can process the signal on the graph according to the Haar-like filter to obtain the graph signal in a specific frequency band. Finally, pruning is performed based on the desired frequency band signal. In this article, we use a band-stop filter to retain the high-frequency signal representing object details and the low-frequency signal of the background point. 1.2 Neural compensation mechanismAfter spectrum pruning, the rendering quality will be affected by the deletion Removing too many Gaussian elements will inevitably degrade. To solve this problem, we use a neural network to compensate for this quality loss, as shown in Figure 2. We converted from Gaussian Splatting to Feature Splatting, introducing a lightweight convolutional neural network to output Gaussian primitives mapped to RGB values on the image to fuse different bases. Yuan information. This allows the weights of the compensation network to indirectly capture the relationship between primitives in the two-dimensional image space.
Figure 2: The original 3DGS shown on the left requires a large amount of storage space because it does not capture the relationship between primitives; shown in the middle Our spectral pruning strategy models the relationship between Gaussian primitives; the right side shows neural compensation using 2D features to improve rendering. Specifically, instead of rendering the RGB image directly like 3DGS, we obtain one through a differentiable rasterizer for 3D Gaussian Feature Map,This rasterizer projects the features of 3D Gaussian,cells onto a 2D feature map. We then utilize a lightweight neural network to model the relationship between primitives and compensate for the quality degradation after spectral pruning. This network consists of a four-layer fully convolutional U-Net with skip-connection, which aggregates information from different primitives. Use average pooling for downsampling and bilinear interpolation for image upsampling. The network takes rasterized feature maps as input and outputs RGB images. #The overall framework of SUNDAE is shown in Figure 3 below.
圖3: (a) Pipeline: 對一個預訓練的3D 高斯場,採用基於圖的剪枝策略對高斯基元進行降採樣,並使用卷積神經網路來補償剪枝造成的損失。 (b) 基於圖的剪枝:基於高斯基元之間空間關係的圖被用來剪枝。透過使用帶阻濾波器,這一過程便於從高頻組件中提取細節訊息,同時捕捉低頻部分的一般特徵,從而實現整個場景的全面而高效的表示。 此外,我們還提出了一個連續剪枝的策略來降低峰值存儲,與訓練後剪枝不同,後者從一個完全密集的高斯場中剪除基元,連續剪枝涉及在整個訓練過程中的預定義間隔定期移除特定數量或比例的基元。這種方法旨在在訓練 3D 高斯場時持續控制基元的最大數量,從而降低訓練期間的峰值記憶體需求,並允許在 GPU 記憶體較低的 GPU 裝置上進行訓練。 經驗表明,較低峰值記憶體的優勢以較弱的最終記憶體佔用控制為代價。例如,如果我們每 2000 次迭代剪掉 20% 的基元,3D 高斯場的最終收斂狀態可能會偏離預期的 20% 減少。 此外,這種變化可能在不同場景中有所不同,增加了剪枝效果的可預測性和一致性的複雜性。因此,我們將連續剪枝策略視為必要時的替代方案。 我們將SUNDAE 與最先進的3DGS 和NeRF 演算法進行對比,相比於3DGS 來說,我們的模型只佔用了10% 的記憶體就可以達到相似的效果,並且使用30% 或50% 的記憶體便能超過原版3DGS。並且在 FPS 上遠遠超過了 NeRF 相關的其他演算法。 這是由於我們的模型能較好的捕捉高斯基元之間的關係,使用更少的高斯基元來高效地表徵三維場景。 定性結果中可以看到看到,我們們將SUNDAE 在1% 和10% 取樣率的定性結果與3DGS 和InstantNGP 進行比較。 定性結果顯示,SUNDAE 能夠在只使用 10% 甚至 1% 的記憶體消耗下,達到類似的新視角合成品質。圖成功地建構了基元之間的關係,而神經補償頭部有效地維持了渲染品質。並且從圖 5 的第四行和最後一行可以看到,頻譜剪枝能夠移除靠近攝影機的漂浮物。
#帶阻濾波器的比率由一個參數表示。具體來說,在基於圖形的剪枝過程中,我們採樣了若干基元,包括一定比例 () 的高通和剩餘的 (1-) 低通。 結果表明,這個參數對渲染品質有顯著影響,50% 的比率提供了最佳的結果,而對低頻或高頻訊號的不成比例強調會導致品質下降,因為通50% 的比率保留了均衡的高頻細節和低頻背景所以效果更優。
#如圖6 和表2 所示,我們定性和定量地展示了補償網絡的重要性。如表 2 所示,所有取樣率下,使用神經補償相比不使用都表現出了改善的性能。這一點透過圖 6 中所示的可視化結果得到了進一步支持,展示了該模組在緩解頻譜剪枝造成的性能下降方面的補償能力。同時,也證明了基元之間的關係被很好地捕捉。
如表 3 所示,我們嘗試了不同大小的補償網絡,增加網絡大小並不一定能提高渲染質量,這與 ADOP 的發現一致,顯示了相似的趨勢。我們採用 30MB 的 4 層 UNet 作為預設設置,以最佳平衡品質和記憶體。
如上表1 所示,保留50% 的基元在渲染品質上優於原始的3DGS。我們也額外測試了保留 80% 和保留所有基元,以檢驗取樣率如何影響最終結果,如表 4 所示。 結果顯示,保留 80% 的基元提高了渲染質量,根據 LPIPS 顯示出改進,但在 PSNR 和 SSIM 上的視覺提升很小。保留所有基元(並訓練更多的周期)無法進一步提高質量,這也顯示了建模基元關係的重要性。如果沒有有效的關係建模,更多的基元會使模型難以收斂,且大量基元對場景表示產生負面影響。 此外,我們的目標是平衡渲染品質與儲存效率;然而,將儲存增加到620MB 以保留80% 的基元只帶來了輕微的品質提升,從而降低了儲存效率。
我們在MipNeRF360 資料集中的Bicycle 和Counter 情境上測試了連續採樣策略,設定不同的剪枝間隔迭代次數和剪枝率。如表 5 所示,Points 是訓練後的基元數量,Ratio 是訓練後基元數量與原始 3DGS 的大致比率。 結果顯示,這種策略可以降低峰值內存,但難以控制最終內存(透過 Points 和 Ratio 反映)。因此,我們驗證了我們的訓練後剪枝策略,但仍在我們的開源工具箱中提供連續剪枝策略作為替代方案。
#關於訓練時間、CUDA 記憶體、渲染幀率和ROM 儲存的詳細信息,請參見表6。值得注意的是,“Ours-50%”版本在可接受的訓練時間內(1.41 小時)達到了最好的渲染質量,同時實現了實時渲染,並顯著降低了訓練期間的CUDA 內存使用和ROM 存儲。
#在這篇工作中,我們提出了一種新穎的具有神經補償的頻譜剪枝高斯場SUNDAE,透過引入圖訊號處理,來建模高斯基元之間的關係,並混合不同基元的資訊來補償剪枝造成的資訊損失。 我們使用高斯基元之間的空間資訊建立圖來建模關係,並根據頻譜資訊進行剪枝,去除冗餘的基元。一個輕量級神經網路被用來補償剪枝後不可避免的渲染品質損失。 實驗結果表明,SUNDAE 在保持 3DGS 的效率的同時,顯著減小了內存,提升了效率並且保持了高保真的渲染品質。 The above is the detailed content of Say goodbye to the 3D Gaussian Splatting algorithm, the spectral pruning Gaussian field SUNDAE with neural compensation is open source. For more information, please follow other related articles on the PHP Chinese website!