MVDream: Multi-view Diffusion for 3D Generation
Authors: Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on three tasks: (1) multi-view image generation for evaluating image quality and consistency (Sec. 4.1), (2) 3D (Ne RF) generation with multi-view score distillation as a main downstream task (Sec. 4.2), and (3) Dream Booth for personalized 3D generation (Sec. 4.3). |
| Researcher Affiliation | Collaboration | 1 Byte Dance, USA, 2 University of California, San Diego {yichun.shi,peng.wang,kejie.li,xiao.yang}@bytedance.com {jianglong.yeh,mai.t.long88}@gmail.com |
| Pseudocode | Yes | Algorithm 1: Pseudocode for MVDream training |
| Open Source Code | No | Our project page is https://MV-Dream.github.io, Besides, we will release our code as well as model checkpoints publicly after the paper submission. The latter statement indicates a future release, not current availability, and the project page does not explicitly state it hosts the code for the paper. |
| Open Datasets | Yes | We fine-tune the open-sourced stable diffusion 2.1 model (sta) on the Objaverse dataset (Deitke et al., 2023) and LAION dataset (Schuhmann et al., 2022) for experiments. |
| Dataset Splits | Yes | We randomly choose 1,000 subjects from the held-out validation set and generate 4-view images using the given prompts and camera parameters. |
| Hardware Specification | Yes | The training takes about 3 days on 32 Nvidia Tesla A100 GPUs., The SDS process takes about 1.5 hour on a Tesla V100 GPU with shading and 1 hour without shading. |
| Software Dependencies | Yes | We fine-tune our model from the Stable Diffusion v2.1 base model (512 512 resolution) (sta)..., For multi-view SDS, we implement our multi-view diffusion guidance in the threestudio (thr) library... |
| Experiment Setup | Yes | We use a reduced image size of 256 256 and a total batch size of 1,024 (4,096 images) for training and fine-tune the model for 50,000 steps. ... The 3D model is optimized for 10,000 steps with an Adam W optimizer (Kingma & Ba, 2014) at a learning rate of 0.01. For SDS, the maximum and minimum time steps are decreased from 0.98 to 0.5 and 0.02, respectively, over the first 8,000 steps. |