BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

Authors: Yonghao Yu, Shunan Zhu, Huai Qin, Haorui Li

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiment is conducted on different differentiable 3D representations, revealing that Boost Dream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes. Extensive experiments are conducted, including refinement and comparison experiments.
Researcher Affiliation Collaboration Yonghao Yu1 , Shunan Zhu1 , Huai Qin1 and Haorui Li2 1Waseda University 2 Southeast University yuyonghao@suou.waseda.jp, {shunan-zhu, mizuki qin}@ruri.waseda.jp, lihaorui.lhr@alibaba-inc.com
Pseudocode No The paper describes its processes but does not include any formal pseudocode or algorithm blocks.
Open Source Code No The paper cites `https://boostdream.github.io/` as [Yonghao et al., 2024], which is a project website. It does not contain an explicit statement that their *own* source code is released or provide a direct link to a code repository for Boost Dream within the provided text.
Open Datasets Yes Consider the dataset used in 2D image generation tasks, Laion5B [Schuhmann et al., 2022] contains more than 5 billion image-text pairs while the largest 3D dataset available, Objaverse-XL [Deitke et al., 2023] can only carry 10 million 3D assets with worse quality captions.
Dataset Splits No The paper describes the total number of iterations for its refining process (e.g., 4800 iterations), but it does not specify any explicit train/validation/test dataset splits for its experiments or method evaluation.
Hardware Specification Yes All the experiments in this paper are conducted on a single NVIDIA V100 GPU with 32GB VRAM.
Software Dependencies Yes We use Control Net 1.1 with Stable Diffusion 1.5 [Zhang et al., 2023b] as the diffusion model.
Experiment Setup Yes After initialization, we set 4800 iterations for the refining process, of which the first 1800 iterations are under the guidance of the coarse 3D assets, and the subsequent 3000 iterations are guided by the differentiable rendering result itself. In our experimental settings, we define rotation angle α = 90 , so the camera at position p0 will rotate around axis a and get four camera positions. Ltotal = LMV SDS(ϕ, x) + αLorient + βLopacity (13) where α and β are the weights for the orientation and opacity losses, respectively.