Improved Distribution Matching Distillation for Fast Image Synthesis

Authors: Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, Bill Freeman

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments We evaluate our approach, DMD2, using several benchmarks, including class-conditional image generation on Image Net-64 64 [62], and text-to-image synthesis on COCO 2014 [63] with various teacher models [1,58]. We use the Fréchet Inception Distance (FID) [60] to measure image quality and diversity, and the CLIP Score [64] to evaluate text-to-image alignment.
Researcher Affiliation Collaboration 1Massachusetts Institute of Technology 2Adobe Research
Pseudocode Yes Algorithm 1: DMD (original) Input: Pretrained real diffusion model µreal, paired ODE solution pairs D = {zref, yref} Output: Trained generator G
Open Source Code Yes We release our code and pretrained models.
Open Datasets Yes Our generators are trained by distilling SDXL [58] and SD v1.5 [1], respectively, using a subset of 3 million prompts from LAION-Aesthetics [59].
Dataset Splits No The paper mentions using 'COCO 2014 validation set' for evaluation purposes, but does not specify explicit train/validation/test splits for the datasets used for training (e.g., LAION-Aesthetics) nor a validation split used during model training.
Hardware Specification Yes We use a batch size of 280 and train the model on 7 A100 GPUs for 200K iterations
Software Dependencies No The paper mentions using the Adam W optimizer but does not specify software versions for libraries, frameworks, or programming languages used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For the standard training setup, we use the Adam W optimizer [88] with a learning rate of 2 10 6, a weight decay of 0.01, and beta parameters (0.9, 0.999). We use a batch size of 280 and train the model on 7 A100 GPUs for 200K iterations, which takes approximately 2 days. The number of fake diffusion model update per generator update is set to 5. The weight for the GAN loss is set to 3 10 3.