Neural Sparse Voxel Fields
Authors: Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, Christian Theobalt
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed NSVF on several tasks including multi-scene learning, rendering of dynamic and large-scale indoor scenes, and scene editing and composition. We also perform ablation studies to validate different kinds of feature representations and different options in progressive training. |
| Researcher Affiliation | Collaboration | Max Planck Institute for Informatics Facebook AI Research National University of Singapore |
| Pseudocode | Yes | Our approach is summarized in Algorithm 1 where we additionally return the transparency A, and the expected depth Z which can be further used for visualizing the normal with finite difference. |
| Open Source Code | Yes | We have open-sourced our codebase at https://github.com/facebookresearch/NSVF |
| Open Datasets | Yes | Datasets (1) Synthetic-Ne RF: The synthetic dataset used in Mildenhall et al. (2020) includes eight objects. (2) Synthetic-NSVF: We additionally render eight objects in the same resolution with more complex geometry and lighting effects. (3) Blended MVS: We test on four objects from Yao et al. (2020). The rendered images are blended with the real images to have realistic ambient lighting.(4) Tanks & Temples: We evaluate on five objects from Knapitsch et al. (2017) where we use the images and label the object masks ourselves. (5) Scan Net: We use two real scenes from Scan Net (Dai et al., 2017). We extract both RGB and depth images from the original video.(6) Maria Sequence: This sequence is provided by Volucap with the meshes of 200 frames of a moving female. We render each mesh to create a dataset. |
| Dataset Splits | No | The paper mentions training and testing, but it does not specify explicit train/validation/test splits (e.g., percentages or counts) for its datasets. |
| Hardware Specification | Yes | For all scenes, we train NSVF using a batch size of 32 images on 8 Nvidia V100 GPUs, and for each image we sample 2048 rays. |
| Software Dependencies | No | The paper mentions using a 'multilayer perceptron network (MLP)' and 'positional encoding proposed by (Vaswani et al., 2017; Mildenhall et al., 2020)', but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We model NSVF with a 32-dimentional learnable voxel embedding for each vertex, and apply positional encoding following (Mildenhall et al., 2020). ...For all scenes, we train NSVF using a batch size of 32 images on 8 Nvidia V100 GPUs, and for each image we sample 2048 rays. ...For all the experiments, we prune the voxels periodically every 2500 steps and progressively halve the voxel and step sizes at 5k, 25k and 75k, separately. |