Understanding and Improving Training-free Loss-based Diffusion Guidance

Authors: Yifei Shen, XINYANG JIANG, Yifan Yang, Yezhen Wang, Dongqi Han, Dongsheng Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments in image and motion generation confirm the efficacy of these techniques.In this section, we evaluate the efficacy of our proposed techniques across various diffusion models and guidance conditions. We compare our methods with established baselines: Universal Guidance (UG) [2], Loss-Guided Diffusion with Monte Carlo (LGD-MC) [37], Training-Free Energy-Guided Diffusion Models (Free Do M) [49], and Manifold Preserving Guided Diffusion (MPGD) [16].
Researcher Affiliation Collaboration Yifei Shen1 Xinyang Jiang1 Yifan Yang1 Yezhen Wang2 Dongqi Han1 Dongsheng Li1 1Microsoft Research Asia 2National University of Singapore
Pseudocode Yes Algorithm 1 Random Augmentation, Algorithm 2 Polyak Step Size, Algorithm 3 Time Travel
Open Source Code Yes The code is available at https://github.com/BIGKnight/Understanding-Training-free-Diffusion-Guidance
Open Datasets Yes Specifically, we utilize the Celeb A-HQ diffusion model [19] to generate high-quality facial images. For the unconditional Image Net diffusion, we employ text guidance in line with the approach used in Free Do M and UG [2, 49]. In this subsection, we extend our evaluation to human motion generation using the Motion Diffusion Model (MDM) [40], which represents motion through a sequence of joint coordinates and is trained on a large corpus of text-motion pairs with classifier-free guidance.
Dataset Splits No The paper mentions using pre-trained models and datasets like Celeb A-HQ, ImageNet, and MDM, but does not provide specific train/validation/test split percentages or numbers for their experiments.
Hardware Specification Yes These experiments were conducted on a single NVIDIA A100 GPU.
Software Dependencies No The paper does not explicitly list specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes We implement Polyak step size within the context of a training-free guidance framework called Free Do M [49] and benchmark the performance of this implementation using the DDIM sampler with 50 steps. For the sampling method, DDIM with 100 steps is adopted as in [49, 37]. In Free Do M and MPGD-Z, resampling is conducted for time steps ranging from 800 to 300, with the time-travel number fixed at 10, as described in [49].