DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

Authors: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train DMV3D on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. In this section, we present an extensive evaluation of our method.
Researcher Affiliation Collaboration 1Adobe Research 2Stanford 3HKU 4HKUST 5TTIC
Pseudocode No The paper describes the model architecture and training process in text and diagrams but does not provide a formal pseudocode or algorithm block.
Open Source Code No Our project website is at: https: //justimyhxu.github.io/projects/dmv3d/. The paper mentions a project website, but it is not explicitly stated as a direct source code repository for the described methodology. The reproducibility statement also does not provide a direct link to the code.
Open Datasets Yes We use rendered multi-view images of 730k objects from the Objaverse (Deitke et al., 2023) dataset. ... To train our text-to-3D model, we use the object captions provided by Cap3D (Luo et al., 2023)... For image-conditioned (single-view reconstruction) model, we combine the Objaverse data with additional real captures of 220k objects from the MVImg Net (Yu et al., 2023) dataset...
Dataset Splits No The paper specifies datasets used for training and testing, but does not explicitly mention a separate validation dataset split with specific percentages or counts.
Hardware Specification Yes We use 128 NVIDIA A100 GPUs to train this model... The small model is trained with 32 NVIDIA A100 GPUs for 200K steps (4 days).
Software Dependencies No Our experiments are implemented in the Py Torch and the codebase is built upon guided diffusion (Dhariwal & Nichol, 2021). However, specific version numbers for PyTorch or guided diffusion are not provided.
Experiment Setup Yes We use Adam W optimizer to train our model with an initial learning rate of 4e 4. We also apply a warm-up of 3K steps and a cosine decay on the learning rate. We train our denoiser with 256 256 input images and render 128 128 image crops for supervision. ... with a batch size of 8 per GPU for 100K steps... Please refer to Tab. 6 in the appendix for an overview of the hyper-parameter settings.