Optimizing DDPM Sampling with Shortcut Fine-Tuning

Authors: Ying Fan, Kangwook Lee

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical evaluation, we demonstrate that our fine-tuning method can further enhance existing fast DDPM samplers, resulting in sample quality comparable to or even surpassing that of the full-step model across various datasets.
Researcher Affiliation Academia 1UW Madison. Correspondence to: Ying Fan, Kangwook Lee <yfan87@wisc.edu, kangwook.lee@wisc.edu>.
Pseudocode Yes Algorithm 1 Shortcut Fine-Tuning with Policy Gradient and Baseline Regularization: SFT-PG (B)
Open Source Code Yes Code is available at https://github.com/ UW-Madison-Lee-Lab/SFT-PG.
Open Datasets Yes We use MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky et al., 2009) and Celeb A (Liu et al., 2015).
Dataset Splits No The paper mentions using specific training sample counts for MNIST, CIFAR-10, and Celeb A, but does not explicitly provide details about how these datasets are split into training, validation, and test sets (e.g., percentages, specific split files, or cross-validation methodology).
Hardware Specification Yes For example, for CIFAR10, progressive distillation takes about a day using 8 TPUv4 chips, while our method takes about 6h using 4 RTX 2080Ti, and the original DDPM training takes 10.6h using TPU v3.8.
Software Dependencies No The paper mentions using Adam (Kingma and Ba, 2014) as an optimizer, but does not specify version numbers for general software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes For hyperparameters, we choose λ 1.0, ncritic 5, ngenerator 10, γ 0.1, except when testing different choices of ngenerator and γ in MNIST, where we use ngenerator 5 and varying γ. ... For optimizers, we use Adam (Kingma and Ba, 2014) with lr 5 ˆ 10 5 for the generator, and lr 1 ˆ 10 3 for both the critic and baseline functions. ... Both pretraining and fine-tuning use batch size 64 and we train 300 epochs for fine-tuning.