Simple and Fast Distillation of Diffusion Models

Authors: Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, Siwei Lyu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task.
Researcher Affiliation Academia Zhenyu Zhou1,2 Defang Chen3 Can Wang1,2 Chun Chen1,2 Siwei Lyu3 1Zhejiang University, State Key Laboratory of Blockchain and Data Security 2Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3University at Buffalo, State University of New York {zhyzhou, defchern}@zju.edu.cn
Pseudocode Yes Algorithm 1 Trajectory Distillation
Open Source Code Yes Our code is available at https://github.com/zju-pi/diff-sampler.
Open Datasets Yes CIFAR10 32×32 [21], Image Net 64×64 [43] and latent-space LSUN-Bedroom 256×256 [57]. For Stable Diffusion [41], we use the v1.5 checkpoint and generate images with a resolution of 512×512.
Dataset Splits Yes For text-to-image generation, we use a guidance scale of 7.5 to generate 5K images with prompts from the MS-COCO [23] validation set.
Hardware Specification Yes on a single NVIDIA A100 GPU.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as PyTorch, CUDA, or other libraries used in the implementation.
Experiment Setup Yes The configuration obtained in Section 3.2 can be applied to different NFEs and datasets. Generally, in the training of SFD and SFD-v, we use DPM-Solver++(3M) [30] as the teacher solver with K = 4 (see Appendix D.2 for an ablation study on K). The use of adjusted tmin = 0.006, AFS and L1 loss introduced in Section 3.2 all lead to improved results. Minor changes are needed for text-to-image generation with Stable Diffusion, where we use DPM-Solver++(2M), which is the default setting used in Stable Diffusion and K = 3. In this case, tmin is increased from 0.03 to 0.1 and the AFS is disabled due to the complex trajectory shown in Figure 9c. These experiment settings are collected in Table 6 in Appendix.