EM Distillation for One-step Diffusion Models

Authors: Sirui Xie, Zhisheng Xiao, Diederik Kingma, Tingbo Hou, Ying Nian Wu, Kevin P. Murphy, Tim Salimans, Ben Poole, Ruiqi Gao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental EMD outperforms existing one-step generative methods in terms of FID scores on Image Net-64 and Image Net-128, and compares favorably with prior work on distilling text-to-image diffusion models.
Researcher Affiliation Collaboration 1Google Deep Mind 2Google Research 3UCLA
Pseudocode Yes Algorithm 1: EM Distillation
Open Source Code No We have not open sourced the model or code, but our approach is data-free so no training data is required. We also provide implementation details in the appendix that we hope are sufficient for reproducing our results.
Open Datasets Yes We employ EMD to learn one-step image generators on Image Net 64 64, Image Net 128 128 [60] and text-to-image generation.
Dataset Splits No The paper does not explicitly state details about the validation dataset split (e.g., percentages, sample counts, or explicit mention of a validation set used in their specific experiments), beyond referencing the overall datasets.
Hardware Specification Yes We run the distillation training for 300k steps (roughly 8 days) on 64 TPU-v4. We run the distillation training for 200k steps (roughly 10 days) on 128 TPU-v5p. Our method, EMD-8, trained on 256 TPU-v5e for 5 hours (5000 steps)...
Software Dependencies No The paper discusses software components and models (e.g., Stable Diffusion v1.5, Adam optimizer) but does not list specific version numbers for software dependencies required for replication.
Experiment Setup Yes We list other hyperparameters in Table 7. We list other hyperparameters in Table 8. We list other hyperparameters in Table 9.