reproducibilityindex.ai

MVDream: Multi-view Diffusion for 3D Generation

Authors: Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on three tasks: (1) multi-view image generation for evaluating image quality and consistency (Sec. 4.1), (2) 3D (Ne RF) generation with multi-view score distillation as a main downstream task (Sec. 4.2), and (3) Dream Booth for personalized 3D generation (Sec. 4.3).
Researcher Affiliation	Collaboration	1 Byte Dance, USA, 2 University of California, San Diego {yichun.shi,peng.wang,kejie.li,xiao.yang}@bytedance.com {jianglong.yeh,mai.t.long88}@gmail.com
Pseudocode	Yes	Algorithm 1: Pseudocode for MVDream training
Open Source Code	No	Our project page is https://MV-Dream.github.io, Besides, we will release our code as well as model checkpoints publicly after the paper submission. The latter statement indicates a future release, not current availability, and the project page does not explicitly state it hosts the code for the paper.
Open Datasets	Yes	We fine-tune the open-sourced stable diffusion 2.1 model (sta) on the Objaverse dataset (Deitke et al., 2023) and LAION dataset (Schuhmann et al., 2022) for experiments.
Dataset Splits	Yes	We randomly choose 1,000 subjects from the held-out validation set and generate 4-view images using the given prompts and camera parameters.
Hardware Specification	Yes	The training takes about 3 days on 32 Nvidia Tesla A100 GPUs., The SDS process takes about 1.5 hour on a Tesla V100 GPU with shading and 1 hour without shading.
Software Dependencies	Yes	We fine-tune our model from the Stable Diffusion v2.1 base model (512 512 resolution) (sta)..., For multi-view SDS, we implement our multi-view diffusion guidance in the threestudio (thr) library...
Experiment Setup	Yes	We use a reduced image size of 256 256 and a total batch size of 1,024 (4,096 images) for training and fine-tune the model for 50,000 steps. ... The 3D model is optimized for 10,000 steps with an Adam W optimizer (Kingma & Ba, 2014) at a learning rate of 0.01. For SDS, the maximum and minimum time steps are decreased from 0.98 to 0.5 and 0.02, respectively, over the first 8,000 steps.