TinyNeRF: Towards 100 x Compression of Voxel Radiance Fields

Authors: Tianli Zhao, Jiayuan Chen, Cong Leng, Jian Cheng

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on five inward-facing datasets, including Synthetic-Ne RF (Mildenhall et al. 2020) which contains 8 objects with relistic images, Synthetic-NSVF (Liu et al. 2020) which contains 8 objects synthesized by NSVF, Blended MVS (Yao et al. 2020) with realistic ambient lighting from real image blending, Deep Voxels (Sitzmann et al. 2019) with 4 Lambertian objects, and a real world data set Tanks&Temples (Knapitsch et al. 2017). Quantitative results are shown in Tab. 2, and the results on the Ne RF Synthetic data set are plotted in Fig. 5. The actual training time of each methods are also shown in the figure. Comparing the dots in Fig. 5, our method achieves a better trade-off between model size and rendering quality. Comparing the sizes of the dots in the figure, our method has only little influence on the training time, typically converging in less than 8 minutes.
Researcher Affiliation Collaboration 1 School of Artificial Intelligence, University of Chinese Academy of Sciences. Beijing, China. 2 Institute of Automation, Chinese Academy of Academy of Sciences, Beijing, China. 3 AIRIA. Nanjing, China. 4 Maicro.ai. Nanjing, China. 5 Southeast University. Nanjing, China.
Pseudocode No The paper describes the steps of its method in textual form and through a diagram, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No We implement our method in Py Torch with the block-wise DCT implemented in CUDA. The codes are built based on the recent state-of-the-art voxel grid based Ne RF implementation DVGO (Sun, Sun, and Chen 2022)3. 3https://github.com/sunset1995/Direct Vox GO. The provided link is to the DVGO baseline, not the Tiny Ne RF code developed by the authors.
Open Datasets Yes We evaluate our method on five inward-facing datasets, including Synthetic-Ne RF (Mildenhall et al. 2020) which contains 8 objects with relistic images, Synthetic-NSVF (Liu et al. 2020) which contains 8 objects synthesized by NSVF, Blended MVS (Yao et al. 2020) with realistic ambient lighting from real image blending, Deep Voxels (Sitzmann et al. 2019) with 4 Lambertian objects, and a real world data set Tanks&Temples (Knapitsch et al. 2017).
Dataset Splits No The paper specifies training iterations and when certain stages (pruning, quantization) are enabled, but it does not provide explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes Our method can compress the voxel grids by more than 100x with minimal sacrifice on rendering quality and speed. For example, we build our codes based on the recent state-of-the-art voxel grid based Ne RF implementation DVGO (Sun, Sun, and Chen 2022)1, the model size can be significantly reduced from 200MB to 2MB, while the training time only grows from 3 minutes to 6.5 minutes on a single NVIDIA A100 GPU... The running speed is measured on a single Intel Core i7-8700 CPU.
Software Dependencies No We implement our method in Py Torch with the block-wise DCT implemented in CUDA. The codes are built based on the recent state-of-the-art voxel grid based Ne RF implementation DVGO (Sun, Sun, and Chen 2022). The paper mentions PyTorch and CUDA but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes By default, the block size for the block-wise DCT is set to 4x4x4 because we find it achieves good trade-off between compression, synthesizing quality and training speed. We keep all the hyper parameters the same with DVGO (Sun, Sun, and Chen 2022) for fair comparison. The grid resolutions for all the scenes are set to 160^3. The pruning and quantization are only enabled during the fine-stage training. We keep the total optimization iterations to be 20000 with 8192 camera rays per batch, where the pruning aware training is enabled after 5000 iterations of common training, and the quantization aware training is further enabled after 12000 iterations. The whole training process typically finishes in less than 10 minutes. The detailed pruning ratios and quantization bit-widths for different target model sizes are shown in Tab. 1.