Interpreting and Improving Diffusion Models from an Optimization Perspective

Authors: Frank Permenter, Chenyang Yuan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments
Researcher Affiliation Industry 1Toyota Research Institute, Cambridge, Massachusetts, USA. Correspondence to: Chenyang Yuan <chenyang.yuan@tri.global>, Frank Permenter <frank.permenter@tri.global>.
Pseudocode Yes Algorithm 1 DDIM sampler (Song et al., 2020a) Require: (σN, . . . , σ0), x N N(0, I), ϵθ Ensure: Compute x0 with N evaluations of ϵθ for t = N, . . . , 1 do xt 1 xt + (σt 1 σt)ϵθ(xt, σt) return x0
Open Source Code Yes Code for the experiments is available at https: //github.com/Toyota Research Institute/ gradient-estimation-sampler
Open Datasets Yes We use denoisers from (Ho et al., 2020; Song et al., 2020a) that were pretrained on the CIFAR10 (32x32) and Celeb A (64x64) datasets (Krizhevsky et al., 2009; Liu et al., 2015).
Dataset Splits No The paper mentions using 'training images' and evaluating on the 'MS COCO validation set', but does not provide specific details on train/validation/test splits for the datasets used to train or evaluate their models.
Hardware Specification Yes All the experiments were run on a single Nvidia RTX 4090 GPU.
Software Dependencies Yes We also use Stable Diffusion 2.1 provided in https://huggingface.co/stabilityai/stable-diffusion-2-1.
Experiment Setup Yes For the CIFAR-10 and Celeb A models, we choose σ1 = q σDDIM(N) 1 and σ0 = 0.01. For CIFAR-10 N = 5, 10, 20, 50 we choose σN = 40 and for Celeb A N = 5, 10, 20, 50 we choose σN = 40, 80, 100, 120 respectively. For Stable Diffusion, we use the same sigma schedule as that in DDIM. ... We found that setting γ = 2 works well for N < 20; for larger N slightly increasing γ also improves sample quality (see Appendix E for more details).