Efficient-3Dim: Learning a Generalizable Single-image Novel-view Synthesizer in One Day

Authors: Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments are conducted to demonstrate the efficiency and generalizability of our proposed method.
Researcher Affiliation Collaboration 1Apple, 2University of Texas at Austin
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link indicating the release of source code for the described methodology.
Open Datasets Yes Dataset We employ the recently released Objaverse (Deitke et al., 2023) dataset for training.
Dataset Splits Yes We adopt 792k samples as the training set and 8k other samples for validation.
Hardware Specification Yes All of our experiments are conducted on 8 Nvidia-A100 GPUs using the PyTorch-Lightning-1.4.2 platform.
Software Dependencies Yes All of our experiments are conducted on 8 Nvidia-A100 GPUs using the PyTorch-Lightning-1.4.2 platform.
Experiment Setup Yes We apply a batch size of 48 per GPU and adopt gradients accumulation by 4 times. Thus the real batch size is 192 * 8 in total. We adopt an Adam optimizer with β1 = 0.9, β2 = 0.999, and 0.01 weight decay. We adopt a half-period cosine schedule of learning rate decaying with the base learning rate to be 1e 4, the final learning to be 1e 5, and the maximum training iterations set to be 30,000. A linear warm-up strategy is applied to the first 200 steps. We also use exponential moving average weights for the UNet denoiser. We select Vi T-L/14 for both the CLIP encoder and the DINO-v2 encoder for a fair comparison.