Understanding and Improving Training-free Loss-based Diffusion Guidance
Authors: Yifei Shen, XINYANG JIANG, Yifan Yang, Yezhen Wang, Dongqi Han, Dongsheng Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments in image and motion generation confirm the efficacy of these techniques.In this section, we evaluate the efficacy of our proposed techniques across various diffusion models and guidance conditions. We compare our methods with established baselines: Universal Guidance (UG) [2], Loss-Guided Diffusion with Monte Carlo (LGD-MC) [37], Training-Free Energy-Guided Diffusion Models (Free Do M) [49], and Manifold Preserving Guided Diffusion (MPGD) [16]. |
| Researcher Affiliation | Collaboration | Yifei Shen1 Xinyang Jiang1 Yifan Yang1 Yezhen Wang2 Dongqi Han1 Dongsheng Li1 1Microsoft Research Asia 2National University of Singapore |
| Pseudocode | Yes | Algorithm 1 Random Augmentation, Algorithm 2 Polyak Step Size, Algorithm 3 Time Travel |
| Open Source Code | Yes | The code is available at https://github.com/BIGKnight/Understanding-Training-free-Diffusion-Guidance |
| Open Datasets | Yes | Specifically, we utilize the Celeb A-HQ diffusion model [19] to generate high-quality facial images. For the unconditional Image Net diffusion, we employ text guidance in line with the approach used in Free Do M and UG [2, 49]. In this subsection, we extend our evaluation to human motion generation using the Motion Diffusion Model (MDM) [40], which represents motion through a sequence of joint coordinates and is trained on a large corpus of text-motion pairs with classifier-free guidance. |
| Dataset Splits | No | The paper mentions using pre-trained models and datasets like Celeb A-HQ, ImageNet, and MDM, but does not provide specific train/validation/test split percentages or numbers for their experiments. |
| Hardware Specification | Yes | These experiments were conducted on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper does not explicitly list specific version numbers for software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We implement Polyak step size within the context of a training-free guidance framework called Free Do M [49] and benchmark the performance of this implementation using the DDIM sampler with 50 steps. For the sampling method, DDIM with 100 steps is adopted as in [49, 37]. In Free Do M and MPGD-Z, resampling is conducted for time steps ranging from 800 to 300, with the time-travel number fixed at 10, as described in [49]. |