Accelerating Guided Diffusion Sampling with Splitting Numerical Methods

Authors: Suttisak Wizadwongsa, Supasorn Suwajanakorn

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 4 EXPERIMENTS: Extending on our observation that classical high-order methods failed on guided sampling, we conducted a series of experiments to investigate this problem and evaluate our solution. ... We measure image similarity using Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al., 2018) (lower is better) and measure sampling time on a single NVIDIA RTX 3090 and a 24-core AMD Threadripper 3960x.
Researcher Affiliation Academia Suttisak Wizadwongsa, Supasorn Suwajanakorn VISTEC, Thailand {suttisak.w s19, supasorn.s}@vistec.ac.th
Pseudocode Yes Algorithm 1: Lie-Trotter Splitting (LTSP) sample x0 N(0, σ2 max I) ; for n {0, ..., N 1} do yn+1 = PLMS( xn, σn, σn+1, ϵσ); xn+1 = yn+1 (σn+1 σn) f(yn+1) ; end Result: x N
Open Source Code No No specific statement or link provided regarding the release of their own source code for the methodology.
Open Datasets Yes For our comparison, we use pre-trained state-of-the-art diffusion models and classifiers from Dhariwal & Nichol (2021), which were trained on the Image Net dataset (Russakovsky et al., 2015) with 1,000 total sampling steps.
Dataset Splits No Not found. The paper mentions pre-trained models and evaluates on a test set but does not provide explicit train/validation/test splits for its own experimental setup.
Hardware Specification Yes We measure image similarity using Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al., 2018) (lower is better) and measure sampling time on a single NVIDIA RTX 3090 and a 24-core AMD Threadripper 3960x.
Software Dependencies No No specific software versions (e.g., Python 3.x, PyTorch 1.x) are mentioned. Only methods and models are cited.
Experiment Setup Yes We treat full-path samples from a classifier-guided DDIM at 1,000 steps as reference solutions. Then, the performance of each configuration is measured by the image similarity between its generated samples using fewer steps and the reference DDIM samples, both starting from the same initial noise map. ... Following (Dhariwal & Nichol, 2021), we use a 25-step DDIM as a baseline, which already produces visually reasonable results. As PLMS and LTSP require the same number of network evaluations as the DDIM, they are used also with 25 steps. For STSP with a slower evaluation time, it is only allowed 20 steps, which is the highest number of steps such that its sampling time is within that of the baseline 25-step DDIM.