Guiding a Diffusion Model with a Bad Version of Itself

Authors: Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, Samuli Laine

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This leads to significant improvements in Image Net generation, setting record FIDs of 1.01 for 64 64 and 1.25 for 512 512, using publicly available networks. Furthermore, the method is also applicable to unconditional diffusion models, drastically improving their quality.
Researcher Affiliation Collaboration Tero Karras NVIDIA Miika Aittala NVIDIA Tuomas Kynkäänniemi Aalto University Jaakko Lehtinen NVIDIA, Aalto University Timo Aila NVIDIA Samuli Laine NVIDIA
Pseudocode Yes Algorithm 1 Reproducing our FID result for the Autoguidance (XS, T/16) row in Table 1. Algorithm 2 Training the additional EDM2 models needed in Section 5.
Open Source Code Yes Our implementation and pre-trained models are available at https://github.com/NVlabs/edm2
Open Datasets Yes Our primary evaluation is carried out using Image Net (ILSVRC2012) [8] at two resolutions: 512 512 and 64 64. [8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Image Net: A large-scale hierarchical image database. In Proc. CVPR, 2009.
Dataset Splits No The paper does not explicitly provide percentages or counts for training, validation, and test splits, nor does it reference predefined validation splits with citations. While it discusses hyperparameter tuning, it doesn't specify a dedicated validation set split.
Hardware Specification Yes We performed our main experiments on top of the publicly available EDM2 [24] codebase4 using NVIDIA A100 GPUs
Software Dependencies Yes We performed our main experiments on top of the publicly available EDM2 [24] codebase4 using NVIDIA A100 GPUs, Python 3.11.7, Py Torch 2.2.0, CUDA 11.8, and Cu DNN 8.9.7.
Experiment Setup Yes We use the EDM2-S and EDM2-XXL models with default sampling parameters: 32 deterministic steps with a 2nd order Heun sampler [23]. We train D1 for 4096 iterations using a batch size of 4096 samples, and D0 for 512 iterations. We set Pmean = 2.3 and Pstd = 1.5, and use αref/ p max(t/tref, 1) learning rate decay schedule with αref = 0.01 and tref = 512 iterations, along with a power function EMA profile [24] with σrel = 0.010.