LEAP: Liberate Sparse-View 3D Modeling from Camera Poses
Authors: Hanwen Jiang, Zhenyu Jiang, Yue Zhao, Qixing Huang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a thorough evaluation of LEAP on a diverse array of object-centric (Wu et al., 2023; Jiang et al., 2022; Deitke et al., 2022) and scene-level (Jensen et al., 2014) datasets. This assessment spans multiple data scales and incorporates both synthetic and real images. Experimental results highlight LEAP s four interesting properties: i) Superior performance. LEAP consistently synthesizes novel views from 2 5 unposed images. It surpasses prior generalizable Ne RFs when they use camera poses predicted by SOTA pose estimators. It performs on par with methods using ground-truth camera poses. |
| Researcher Affiliation | Academia | Hanwen Jiang Zhenyu Jiang Yue Zhao Qixing Huang Department of Computer Sciences, University of Texas at Austin |
| Pseudocode | No | The paper describes the model architecture and processes using text and figures, but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https://hwjiang1510.github.io/LEAP/ We are committed to releasing code for reproducibility and future research. |
| Open Datasets | Yes | We train LEAP on each of the following datasets and test its capability to model the 3D object/scenes on each dataset that has different properties. We note that these datasets are captured by wide-baseline cameras, with randomly sampled or fixed camera poses that are far from each other. Omni Object3D (Wu et al., 2023) contains daily objects from 217 categories. We use a subset with 4800 instances for training and 498 instances for testing. Kubric-Shape Net (Jiang et al., 2022) is a synthetic dataset generated using Kubric (Greff et al., 2022). Objaverse (Deitke et al., 2022) is one of the largest object-centric datasets. DTU dataset (Jensen et al., 2014) is a real scene-level dataset. |
| Dataset Splits | Yes | We use a subset with 4800 instances for training and 498 instances for testing. Its training set has 1000 instances for each of 13 Shape Net (Chang et al., 2015) categories, resulting in 13000 training samples. Its test set is composed of two parts: i) 1300 object instances from training categories; ii) 1000 object instances from 10 novel object categories. We use a subset of 200k and 2k instances for training and testing. For training, we use 5 randomly sampled image sets from each scene as the inputs as well as targets. |
| Hardware Specification | No | LEAP constructs the radiance field in a feed-forward manner without optimization, running within one second on a single consumer-grade GPU. The paper mentions the use of a 'single consumer-grade GPU' but does not specify any particular model, processor, or other detailed hardware specifications. |
| Software Dependencies | No | We use a DINOv2-initialized Vi T (Oquab et al., 2023; Dosovitskiy et al., 2020) as the feature extractor... We use Adam W optimizer (Loshchilov & Hutter, 2017). The paper mentions specific models and optimizers but does not provide version numbers for any software libraries or frameworks used (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | We consider the number of views to be k = 5, with image resolution 224. We set λp = 0.1 and λm = 5.0. We set the peak learning rate as 2e 5 (for the backbone) and 2e 4 (for other components) with a warmup for 500 iterations using Adam W optimizer (Loshchilov & Hutter, 2017). We train the model for 150k iterations and use a linear learning rate scheduler, where the batch size is 32. LEAP has ne = 2 multi-view encoder blocks and nm = 4 2D-3D mapping blocks. The resolution of the 3D neural volume and the volume-based radiance fields are 163 and 643, respectively. We sample 64 points on each ray for rendering. |