SweetDreamer: Aligning Geometric Priors in 2D diffusion for Consistent Text-to-3D

Authors: Weiyu Li, Rui Chen, Xuelin Chen, Ping Tan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present the qualitative and quantitative evaluation of the text-to-3D pipelines as described in Section 3.2, as well as comparison results against other text-to-3D baseline methods.
Researcher Affiliation Collaboration 1 Hong Kong University of Science and Technology 2 Light Illusions 3 South China University of Technology 4 Tencent AI Lab
Pseudocode No The paper describes the method in prose and provides diagrams, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code Yes We implement it in the threestudio (Guo et al., 2023), which implemented a diverse set of state-of-the-art text-to-3D generation pipelines.
Open Datasets Yes We use a public 3D dataset Objaverse (Deitke et al., 2023), which contains around 800k models created by artists, to generate the data for fine-tuning.
Dataset Splits No The paper mentions using Objaverse for fine-tuning and then evaluating on 80 randomly selected text prompts. However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for the Objaverse dataset itself as used in their experiments.
Hardware Specification Yes The entire fine-tuning process takes approximately 2 days using 8 V100 GPUs for 100k steps.
Software Dependencies Yes By default, we conduct experiments based on the Stable Diffusion model (we use v2.1)
Experiment Setup Yes We use the default parameters as in Diffusers, including setting the learning rate to 1e-5 with the constant scheduler, and a batch size of 96 per GPU with 4 gradient accumulation steps.