MVDream: Multi-view Diffusion for 3D Generation

Authors: Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on three tasks: (1) multi-view image generation for evaluating image quality and consistency (Sec. 4.1), (2) 3D (Ne RF) generation with multi-view score distillation as a main downstream task (Sec. 4.2), and (3) Dream Booth for personalized 3D generation (Sec. 4.3).
Researcher Affiliation Collaboration 1 Byte Dance, USA, 2 University of California, San Diego {yichun.shi,peng.wang,kejie.li,xiao.yang}@bytedance.com {jianglong.yeh,mai.t.long88}@gmail.com
Pseudocode Yes Algorithm 1: Pseudocode for MVDream training
Open Source Code No Our project page is https://MV-Dream.github.io, Besides, we will release our code as well as model checkpoints publicly after the paper submission. The latter statement indicates a future release, not current availability, and the project page does not explicitly state it hosts the code for the paper.
Open Datasets Yes We fine-tune the open-sourced stable diffusion 2.1 model (sta) on the Objaverse dataset (Deitke et al., 2023) and LAION dataset (Schuhmann et al., 2022) for experiments.
Dataset Splits Yes We randomly choose 1,000 subjects from the held-out validation set and generate 4-view images using the given prompts and camera parameters.
Hardware Specification Yes The training takes about 3 days on 32 Nvidia Tesla A100 GPUs., The SDS process takes about 1.5 hour on a Tesla V100 GPU with shading and 1 hour without shading.
Software Dependencies Yes We fine-tune our model from the Stable Diffusion v2.1 base model (512 512 resolution) (sta)..., For multi-view SDS, we implement our multi-view diffusion guidance in the threestudio (thr) library...
Experiment Setup Yes We use a reduced image size of 256 256 and a total batch size of 1,024 (4,096 images) for training and fine-tune the model for 50,000 steps. ... The 3D model is optimized for 10,000 steps with an Adam W optimizer (Kingma & Ba, 2014) at a learning rate of 0.01. For SDS, the maximum and minimum time steps are decreased from 0.98 to 0.5 and 0.02, respectively, over the first 8,000 steps.