Elucidating the Design Space of Diffusion-Based Generative Models
Authors: Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained Image Net-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36. |
| Researcher Affiliation | Industry | Tero Karras NVIDIA tkarras@nvidia.com Miika Aittala NVIDIA maittala@nvidia.com Timo Aila NVIDIA taila@nvidia.com Samuli Laine NVIDIA slaine@nvidia.com |
| Pseudocode | Yes | Algorithm 1 Deterministic sampling using Heun s 2nd order method with arbitrary σ(t) and s(t). Algorithm 2 Our stochastic sampler with σ(t) = t and s(t) = 1. |
| Open Source Code | Yes | Our implementation and pre-trained models are available at https://github.com/NVlabs/edm |
| Open Datasets | Yes | We evaluate the DDPM++ cont. (VP) and NCSN++ cont. (VE) models by Song et al. [48] trained on unconditional CIFAR-10 [28] at 32 32... We also evaluate the ADM (dropout) model by Dhariwal and Nichol [9] trained on class-conditional Image Net [8] at 64 64... |
| Dataset Splits | Yes | For CIFAR-10, we used the official 50,000 training images and 10,000 test images. For FFHQ and AFHQv2, we used the official training splits of 70,000 and 15,220 images, respectively. |
| Hardware Specification | Yes | Our project consumed 250MWh on an in-house cluster of NVIDIA V100s. |
| Software Dependencies | No | The paper mentions that models are implemented in PyTorch and trained using NVIDIA CUDA, but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | Table 1 presents formulas for reproducing deterministic variants of three earlier methods in our framework... Parameters βd = 19.9, βmin = 0.1 σmin = 0.002, σmax = 80... We set ρ = 7 for the remainder of this paper. Appendix F.3 'Training settings' also provides detailed configuration information. |