Elucidating the Design Space of Diffusion-Based Generative Models

Authors: Tero Karras, Miika Aittala, Timo Aila, Samuli Laine

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained Image Net-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.
Researcher Affiliation Industry Tero Karras NVIDIA tkarras@nvidia.com Miika Aittala NVIDIA maittala@nvidia.com Timo Aila NVIDIA taila@nvidia.com Samuli Laine NVIDIA slaine@nvidia.com
Pseudocode Yes Algorithm 1 Deterministic sampling using Heun s 2nd order method with arbitrary σ(t) and s(t). Algorithm 2 Our stochastic sampler with σ(t) = t and s(t) = 1.
Open Source Code Yes Our implementation and pre-trained models are available at https://github.com/NVlabs/edm
Open Datasets Yes We evaluate the DDPM++ cont. (VP) and NCSN++ cont. (VE) models by Song et al. [48] trained on unconditional CIFAR-10 [28] at 32 32... We also evaluate the ADM (dropout) model by Dhariwal and Nichol [9] trained on class-conditional Image Net [8] at 64 64...
Dataset Splits Yes For CIFAR-10, we used the official 50,000 training images and 10,000 test images. For FFHQ and AFHQv2, we used the official training splits of 70,000 and 15,220 images, respectively.
Hardware Specification Yes Our project consumed 250MWh on an in-house cluster of NVIDIA V100s.
Software Dependencies No The paper mentions that models are implemented in PyTorch and trained using NVIDIA CUDA, but it does not specify version numbers for these software components.
Experiment Setup Yes Table 1 presents formulas for reproducing deterministic variants of three earlier methods in our framework... Parameters βd = 19.9, βmin = 0.1 σmin = 0.002, σmax = 80... We set ρ = 7 for the remainder of this paper. Appendix F.3 'Training settings' also provides detailed configuration information.