Simple and Fast Distillation of Diffusion Models
Authors: Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, Siwei Lyu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. |
| Researcher Affiliation | Academia | Zhenyu Zhou1,2 Defang Chen3 Can Wang1,2 Chun Chen1,2 Siwei Lyu3 1Zhejiang University, State Key Laboratory of Blockchain and Data Security 2Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3University at Buffalo, State University of New York {zhyzhou, defchern}@zju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Trajectory Distillation |
| Open Source Code | Yes | Our code is available at https://github.com/zju-pi/diff-sampler. |
| Open Datasets | Yes | CIFAR10 32×32 [21], Image Net 64×64 [43] and latent-space LSUN-Bedroom 256×256 [57]. For Stable Diffusion [41], we use the v1.5 checkpoint and generate images with a resolution of 512×512. |
| Dataset Splits | Yes | For text-to-image generation, we use a guidance scale of 7.5 to generate 5K images with prompts from the MS-COCO [23] validation set. |
| Hardware Specification | Yes | on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as PyTorch, CUDA, or other libraries used in the implementation. |
| Experiment Setup | Yes | The configuration obtained in Section 3.2 can be applied to different NFEs and datasets. Generally, in the training of SFD and SFD-v, we use DPM-Solver++(3M) [30] as the teacher solver with K = 4 (see Appendix D.2 for an ablation study on K). The use of adjusted tmin = 0.006, AFS and L1 loss introduced in Section 3.2 all lead to improved results. Minor changes are needed for text-to-image generation with Stable Diffusion, where we use DPM-Solver++(2M), which is the default setting used in Stable Diffusion and K = 3. In this case, tmin is increased from 0.03 to 0.1 and the AFS is disabled due to the complex trajectory shown in Figure 9c. These experiment settings are collected in Table 6 in Appendix. |