Optimizing DDPM Sampling with Shortcut Fine-Tuning
Authors: Ying Fan, Kangwook Lee
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical evaluation, we demonstrate that our fine-tuning method can further enhance existing fast DDPM samplers, resulting in sample quality comparable to or even surpassing that of the full-step model across various datasets. |
| Researcher Affiliation | Academia | 1UW Madison. Correspondence to: Ying Fan, Kangwook Lee <yfan87@wisc.edu, kangwook.lee@wisc.edu>. |
| Pseudocode | Yes | Algorithm 1 Shortcut Fine-Tuning with Policy Gradient and Baseline Regularization: SFT-PG (B) |
| Open Source Code | Yes | Code is available at https://github.com/ UW-Madison-Lee-Lab/SFT-PG. |
| Open Datasets | Yes | We use MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky et al., 2009) and Celeb A (Liu et al., 2015). |
| Dataset Splits | No | The paper mentions using specific training sample counts for MNIST, CIFAR-10, and Celeb A, but does not explicitly provide details about how these datasets are split into training, validation, and test sets (e.g., percentages, specific split files, or cross-validation methodology). |
| Hardware Specification | Yes | For example, for CIFAR10, progressive distillation takes about a day using 8 TPUv4 chips, while our method takes about 6h using 4 RTX 2080Ti, and the original DDPM training takes 10.6h using TPU v3.8. |
| Software Dependencies | No | The paper mentions using Adam (Kingma and Ba, 2014) as an optimizer, but does not specify version numbers for general software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | For hyperparameters, we choose λ 1.0, ncritic 5, ngenerator 10, γ 0.1, except when testing different choices of ngenerator and γ in MNIST, where we use ngenerator 5 and varying γ. ... For optimizers, we use Adam (Kingma and Ba, 2014) with lr 5 ˆ 10 5 for the generator, and lr 1 ˆ 10 3 for both the critic and baseline functions. ... Both pretraining and fine-tuning use batch size 64 and we train 300 epochs for fine-tuning. |