HIFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

Authors: Junzhe Zhu, Peiye Zhuang, Sanmi Koyejo

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superiority of our method over previous approaches, enabling the generation of highly detailed and view-consistent 3D assets through a single-stage training process.
Researcher Affiliation Collaboration Junzhe Zhu 1, Peiye Zhuang 1,2, Sanmi Koyejo1 1Stanford University, 2Snap Inc.
Pseudocode Yes Algorithm 1 Training Procedure
Open Source Code No Our approach is implemented based on a publicly available repository 2. (Footnote 2: https://github.com/ashawkey/stable-dreamfusion/tree/main) The paper states their approach is *based on* a public repository, but does not explicitly state that *their modified code* (for the work described in this paper) is released or provided with a link.
Open Datasets No The paper refers to using pre-trained models (e.g., Stable Diffusion (Rombach et al., 2022)) but does not specify a publicly available dataset that *their* method is trained on or fine-tuned with in the traditional sense of a dataset for explicit training splits. Their method optimizes a 3D representation using guidance from these pre-trained models based on text prompts.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. It describes optimization procedures rather than traditional dataset-based training.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models) used for running its experiments. It mentions an 'instant-ngp' for positional encoding but no hardware details.
Software Dependencies No The paper mentions several software components like 'Adam', 'DDIM', 'SD model', 'Deep Floyd IF model', 'T5-XXL', and 'instant-ngp', along with their respective citations, but does not provide specific version numbers for these software components or libraries.
Experiment Setup Yes Training setup. We use Adam (Kingma & Ba, 2015) with a learning rate of 10 2 for instantngp encoding, and 10 3 for Ne RF weights. In practice, we choose total iter as 104 iterations. The rendering resolution is 512 512. We employ DDIM (Song et al., 2021) with empirically chosen parameters r = 0.25, and η = 1 to accelerate training. We choose the hyper-parameters λrgb = 0.1, λd = 0.1, and λzvar = 3. Similar to prior work (Poole et al., 2022; Lin et al., 2023; Wang et al., 2023a), we use classifier-free guidance (Ho & Salimans, 2022) of 100 for our diffusion model.