Neural Groundplans: Persistent Neural Scene Representations from a Single Image
Authors: Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Andrei Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We outperform Pixel Ne RF (Yu et al., 2020) and u ORF (Yu et al., 2022) in terms of PSNR, SSIM, and LPIPS on both CLEVR and Co SY datasets. We train on 8000 scenes, and evenly split the rest into validation and test sets. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2MIT BCS 3MIT CBMM 4Toyota Research Institute 5The NSF AI Institute for Artificial Intelligence and Fundamental Interactions |
| Pseudocode | No | The paper describes the method in text and uses figures, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, 'Datasets and code will be made publicly available.' and 'The dataset will be publicly released for further research in this direction.', indicating future availability, not current access. A project webpage link is provided, but it doesn't explicitly state that the source code is hosted there. |
| Open Datasets | Yes | Our method is trained on multi-view observations of dynamic scenes. We present results on the moving CLEVR dataset (Yu et al., 2022), commonly used for self-supervised object discovery benchmarks (Yu et al., 2022), and the procedurally generated autonomous driving dataset Co SY (Bhandari, 2018). |
| Dataset Splits | Yes | We train on 8000 scenes, and evenly split the rest into validation and test sets. For CLEVR: Our dataset consists of 1500 samples, divided into 1000 train and 500 test samples. |
| Hardware Specification | Yes | The model was trained on a single 32G V100 GPU with a batch of 4 input samples with 2 timesteps for N views (N=5). |
| Software Dependencies | No | The paper mentions software like 'sklearn' and 'Python' and tools like 'Blender' and 'City Engine' for dataset generation, but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We use Adam Kingma & Ba (2014) with a learning rate of 3e-4 to train our pipeline with the image reconstruction loss (L2), hard surfaces loss, and the alpha sparsity loss for 200 epochs. The losses were weighted by λLPIPS = 1, λHSL = 0.1, and λsparse = 0.01. The model was then further finetuned by adding the LPIPS loss weighted by λlpips = 0.5. We sampled 1e4 rays to compute the loss for each training sample in the input batch in the initial phase. During the first phase of training, the rays are sampled randomly and in the second phase when LPIPS loss is applied to the training, we sample rays to render image patches of 16x16. |