Neural NeRF Compression

Authors: Tuan Pham, Stephan Mandt

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results validate that our proposed method surpasses existing works in terms of grid-based Ne RF compression efficacy and reconstruction quality.
Researcher Affiliation Academia 1Department of Computer Science, University of California Irvine.
Pseudocode Yes Algorithm 1 Tenso RF-VM compression
Open Source Code No The paper does not provide an unambiguous statement about releasing code or a link to a code repository.
Open Datasets Yes Synthetic-Ne RF (Mildenhall et al., 2021): This dataset contains 8 scenes at resolution 800 800 rendered by Blender. Each scene contains 100 training views and 200 testing views. Synthetic-NSVF (Liu et al., 2020): This dataset also contains 8 rendered scenes at resolution 800 800. LLFF (Mildenhall et al., 2019): LLFF contains 8 realworld scenes made of forward-facing images with non empty background. Tanks and Temples (Knapitsch et al., 2017): We use 5 real-world scenes: Barn, Caterpillar, Family, Ignatus, Truck from the Tanks and Temples dataset to experiment with.
Dataset Splits No The paper mentions training and testing views but does not explicitly specify a validation dataset split or a methodology for creating one.
Hardware Specification Yes All experimental procedures are executed using Py Torch (Paszke et al., 2019) on NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions PyTorch (Paszke et al., 2019) but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes Hyperparameters. As discussed in Section 3.1, our decoder has two transposed convolutional layers with SELU activation (Klambauer et al., 2017). They both have a kernel size of 3, with stride 2 and padding 1. Thus, each layer has an upsampling factor of 2. Given a feature plane sized Ci Wi Hi, we initialize the corresponding latent code Zi to have the size of CZi Wi/4 Hi/4. Having a decoder with more parameters will enhance the model s decoding ability while also increase its size. In light of this trade-off, we introduce two configurations: ECTenso RF-H (stands for Entropy Coded Tenso RF high compression) employs latent codes with 192 channels and a decoder with 96 hidden channels, while ECTenso RF-L (low compression) has 384 latent channels and 192 decoder hidden channels. Regarding the hyperparameter λ, we experiment within the set {0.02, 0.01, 0.005, 0.001, 0.0005, 0.0002, 0.0001}, with higher λ signifying a more compact model. We train our models for 30, 000 iterations with Adam optimizer (Kingma & Ba, 2015). We use an initial learning rate of 0.02 for the latent codes and 0.001 for the networks, and apply an exponential learning rate decay.