DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model
Authors: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train DMV3D on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. In this section, we present an extensive evaluation of our method. |
| Researcher Affiliation | Collaboration | 1Adobe Research 2Stanford 3HKU 4HKUST 5TTIC |
| Pseudocode | No | The paper describes the model architecture and training process in text and diagrams but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | No | Our project website is at: https: //justimyhxu.github.io/projects/dmv3d/. The paper mentions a project website, but it is not explicitly stated as a direct source code repository for the described methodology. The reproducibility statement also does not provide a direct link to the code. |
| Open Datasets | Yes | We use rendered multi-view images of 730k objects from the Objaverse (Deitke et al., 2023) dataset. ... To train our text-to-3D model, we use the object captions provided by Cap3D (Luo et al., 2023)... For image-conditioned (single-view reconstruction) model, we combine the Objaverse data with additional real captures of 220k objects from the MVImg Net (Yu et al., 2023) dataset... |
| Dataset Splits | No | The paper specifies datasets used for training and testing, but does not explicitly mention a separate validation dataset split with specific percentages or counts. |
| Hardware Specification | Yes | We use 128 NVIDIA A100 GPUs to train this model... The small model is trained with 32 NVIDIA A100 GPUs for 200K steps (4 days). |
| Software Dependencies | No | Our experiments are implemented in the Py Torch and the codebase is built upon guided diffusion (Dhariwal & Nichol, 2021). However, specific version numbers for PyTorch or guided diffusion are not provided. |
| Experiment Setup | Yes | We use Adam W optimizer to train our model with an initial learning rate of 4e 4. We also apply a warm-up of 3K steps and a cosine decay on the learning rate. We train our denoiser with 256 256 input images and render 128 128 image crops for supervision. ... with a batch size of 8 per GPU for 100K steps... Please refer to Tab. 6 in the appendix for an overview of the hyper-parameter settings. |