MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses

Authors: Yang Fu, Ishan Misra, Xiaolong Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments We empirically evaluate Mono Ne RF and compare it to the existing approaches on three different tasks: monocular depth estimation, camera pose estimation, and single image novel view synthesis. We perform evaluations on indoor scenes. Compared to outdoor street views, indoor scenes have more structural variance and are more commonly used for evaluating all three tasks together.
Researcher Affiliation Collaboration 1University of California, San Diego 2FAIR, Meta AI. Correspondence to: Xiaolong Wang <xiw012@ucsd.edu>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No More qualitative results are available at: https://oasisyang.github.io/mononerf. This link is specified for 'qualitative results' and does not explicitly state that the source code for the methodology is available there.
Open Datasets Yes For depth estimation, we train on Scannet (Dai et al., 2017). ...Beyond Scan Net (Dai et al., 2017), we also evaluate the depth estimation performance on NYU Depth V2 (Nathan Silberman & Fergus, 2012). ...we train the model only with Real Estate10K (Zhou et al., 2018) training data
Dataset Splits No The paper mentions training on 'all training sequences' and evaluating on 'all testing sequences released in the official test split' for ScanNet. For Real Estate10K, it states training on 'training data'. However, it does not provide specific details on a separate validation split, nor exact percentages or sample counts for any train/validation/test splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using an 'Adam optimizer' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes In the pre-processing step, we resize all images to the resolution of 256 256 for both training and testing. During training, we randomly sample 3 frames per sequence with the interval of 5 as the input to ensure the camera motion is large enough. The number of planes D is set to 64 and the range of camera frustum is predefined as [0.2,20]. We train our model end-to-end using a batch size of 4 with an Adam optimizer for 10 epochs. The initial learning rate is set to 0.0001 and is halved at 4,6,8 epochs. We empirically set the balance parameters λL1,λssim,λsmooth,λconsist and λreproj in Eq. 13 to 1.0,1.0,1.0,0.01,1.0 and 30, respectively.