Guiding a Diffusion Model with a Bad Version of Itself
Authors: Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, Samuli Laine
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This leads to significant improvements in Image Net generation, setting record FIDs of 1.01 for 64 64 and 1.25 for 512 512, using publicly available networks. Furthermore, the method is also applicable to unconditional diffusion models, drastically improving their quality. |
| Researcher Affiliation | Collaboration | Tero Karras NVIDIA Miika Aittala NVIDIA Tuomas Kynkäänniemi Aalto University Jaakko Lehtinen NVIDIA, Aalto University Timo Aila NVIDIA Samuli Laine NVIDIA |
| Pseudocode | Yes | Algorithm 1 Reproducing our FID result for the Autoguidance (XS, T/16) row in Table 1. Algorithm 2 Training the additional EDM2 models needed in Section 5. |
| Open Source Code | Yes | Our implementation and pre-trained models are available at https://github.com/NVlabs/edm2 |
| Open Datasets | Yes | Our primary evaluation is carried out using Image Net (ILSVRC2012) [8] at two resolutions: 512 512 and 64 64. [8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Image Net: A large-scale hierarchical image database. In Proc. CVPR, 2009. |
| Dataset Splits | No | The paper does not explicitly provide percentages or counts for training, validation, and test splits, nor does it reference predefined validation splits with citations. While it discusses hyperparameter tuning, it doesn't specify a dedicated validation set split. |
| Hardware Specification | Yes | We performed our main experiments on top of the publicly available EDM2 [24] codebase4 using NVIDIA A100 GPUs |
| Software Dependencies | Yes | We performed our main experiments on top of the publicly available EDM2 [24] codebase4 using NVIDIA A100 GPUs, Python 3.11.7, Py Torch 2.2.0, CUDA 11.8, and Cu DNN 8.9.7. |
| Experiment Setup | Yes | We use the EDM2-S and EDM2-XXL models with default sampling parameters: 32 deterministic steps with a 2nd order Heun sampler [23]. We train D1 for 4096 iterations using a batch size of 4096 samples, and D0 for 512 iterations. We set Pmean = 2.3 and Pstd = 1.5, and use αref/ p max(t/tref, 1) learning rate decay schedule with αref = 0.01 and tref = 512 iterations, along with a power function EMA profile [24] with σrel = 0.010. |