Neural NeRF Compression
Authors: Tuan Pham, Stephan Mandt
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results validate that our proposed method surpasses existing works in terms of grid-based Ne RF compression efficacy and reconstruction quality. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California Irvine. |
| Pseudocode | Yes | Algorithm 1 Tenso RF-VM compression |
| Open Source Code | No | The paper does not provide an unambiguous statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | Synthetic-Ne RF (Mildenhall et al., 2021): This dataset contains 8 scenes at resolution 800 800 rendered by Blender. Each scene contains 100 training views and 200 testing views. Synthetic-NSVF (Liu et al., 2020): This dataset also contains 8 rendered scenes at resolution 800 800. LLFF (Mildenhall et al., 2019): LLFF contains 8 realworld scenes made of forward-facing images with non empty background. Tanks and Temples (Knapitsch et al., 2017): We use 5 real-world scenes: Barn, Caterpillar, Family, Ignatus, Truck from the Tanks and Temples dataset to experiment with. |
| Dataset Splits | No | The paper mentions training and testing views but does not explicitly specify a validation dataset split or a methodology for creating one. |
| Hardware Specification | Yes | All experimental procedures are executed using Py Torch (Paszke et al., 2019) on NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions PyTorch (Paszke et al., 2019) but does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | Hyperparameters. As discussed in Section 3.1, our decoder has two transposed convolutional layers with SELU activation (Klambauer et al., 2017). They both have a kernel size of 3, with stride 2 and padding 1. Thus, each layer has an upsampling factor of 2. Given a feature plane sized Ci Wi Hi, we initialize the corresponding latent code Zi to have the size of CZi Wi/4 Hi/4. Having a decoder with more parameters will enhance the model s decoding ability while also increase its size. In light of this trade-off, we introduce two configurations: ECTenso RF-H (stands for Entropy Coded Tenso RF high compression) employs latent codes with 192 channels and a decoder with 96 hidden channels, while ECTenso RF-L (low compression) has 384 latent channels and 192 decoder hidden channels. Regarding the hyperparameter λ, we experiment within the set {0.02, 0.01, 0.005, 0.001, 0.0005, 0.0002, 0.0001}, with higher λ signifying a more compact model. We train our models for 30, 000 iterations with Adam optimizer (Kingma & Ba, 2015). We use an initial learning rate of 0.02 for the latent codes and 0.001 for the networks, and apply an exponential learning rate decay. |