Improved Distribution Matching Distillation for Fast Image Synthesis
Authors: Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, Bill Freeman
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments We evaluate our approach, DMD2, using several benchmarks, including class-conditional image generation on Image Net-64 64 [62], and text-to-image synthesis on COCO 2014 [63] with various teacher models [1,58]. We use the Fréchet Inception Distance (FID) [60] to measure image quality and diversity, and the CLIP Score [64] to evaluate text-to-image alignment. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2Adobe Research |
| Pseudocode | Yes | Algorithm 1: DMD (original) Input: Pretrained real diffusion model µreal, paired ODE solution pairs D = {zref, yref} Output: Trained generator G |
| Open Source Code | Yes | We release our code and pretrained models. |
| Open Datasets | Yes | Our generators are trained by distilling SDXL [58] and SD v1.5 [1], respectively, using a subset of 3 million prompts from LAION-Aesthetics [59]. |
| Dataset Splits | No | The paper mentions using 'COCO 2014 validation set' for evaluation purposes, but does not specify explicit train/validation/test splits for the datasets used for training (e.g., LAION-Aesthetics) nor a validation split used during model training. |
| Hardware Specification | Yes | We use a batch size of 280 and train the model on 7 A100 GPUs for 200K iterations |
| Software Dependencies | No | The paper mentions using the Adam W optimizer but does not specify software versions for libraries, frameworks, or programming languages used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For the standard training setup, we use the Adam W optimizer [88] with a learning rate of 2 10 6, a weight decay of 0.01, and beta parameters (0.9, 0.999). We use a batch size of 280 and train the model on 7 A100 GPUs for 200K iterations, which takes approximately 2 days. The number of fake diffusion model update per generator update is set to 5. The weight for the GAN loss is set to 3 10 3. |