DreamFlow: High-quality text-to-3D generation by Approximating Probability Flow

Authors: Kyungmin Lee, Kihyuk Sohn, Jinwoo Shin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on human preference studies, we demonstrate that Dream Flow provides the most photorealistic 3D content compared to existing methods including Dream Fusion (Poole et al., 2022), Magic3D (Lin et al., 2023), and Prolific Dreamer (Wang et al., 2023b).
Researcher Affiliation Collaboration Kyungmin Lee1 Kihyuk Sohn2 Jinwoo Shin1 1KAIST 2Google Research
Pseudocode Yes Algorithm 1 Approximate probability flow ODE (APFO)
Open Source Code No The paper mentions a project page for visualizations of their method but does not provide a direct URL to the source code itself within the paper text.
Open Datasets Yes Throughout the experiments, we use the text prompts in the Dream Fusion gallery2 to compare our method with the baseline methods Dream Fusion (Poole et al., 2022), Magic3D (Lin et al., 2023), and Prolific Dreamer (Wang et al., 2023b). 2https://dreamfusion3d.github.io/gallery.html
Dataset Splits No The paper describes the use of text prompts from the Dream Fusion gallery for evaluation and optimization, but does not specify traditional train/validation/test dataset splits with percentages or counts, as it generates 3D content from text rather than training on a pre-split image dataset.
Hardware Specification Yes Note that we use a single A100 GPU for generating each 3D content.
Software Dependencies No The paper mentions using Stable Diffusion 2.1 and Stable Diffusion XL Refiner models, but does not provide specific version numbers for general software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes For the Ne RF optimization, we train for 4000 iterations with resolution of 256, which takes 50 minutes. For mesh fine-tuning, we tune the geometry for 5000 iterations and texture for 2000, taking 40 minutes in total. Lastly, we refine mesh for 300 iterations, which takes 20 minutes. In sum, our method takes about 2 hours for a single 3D content generation. We use Adam W optimizer (Loshchilov & Hutter, 2017) where we train the grid encoder with learning rate 1e 2, color and density network with learning rate 1e 3, and background MLP with learning rate 1e 3 or 1e 4.