Diffusion Models Beat GANs on Image Synthesis

Authors: Prafulla Dhariwal, Alexander Nichol

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We achieve an FID of 2.97 on Image Net 128 128, 4.59 on Image Net 256 256, and 7.72 on Image Net 512 512, and we match Big GAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Table 4 summarizes our results. ADM refers to our ablated diffusion model, and ADM-G additionally uses classifier guidance. Our diffusion models can obtain the best FID on each task, and the best s FID on all but one task.
Researcher Affiliation Industry Prafulla Dhariwal Open AI prafulla@openai.com Alex Nichol Open AI alex@openai.com
Pseudocode Yes Algorithm 1 Classifier guided diffusion sampling, given a diffusion model (µ (xt), (xt)), classifier pφ(y|xt), and gradient scale s. Algorithm 2 Classifier guided DDIM sampling, given a diffusion model (xt), classifier pφ(y|xt), and gradient scale s.
Open Source Code Yes We include the source code and instructions for running it. The code is included in the supplemental material.
Open Datasets Yes We train models with the above architecture changes on Image Net 128 128 and compare them on FID, evaluated at two different points of training, in Table 1. To evaluate our improved model architecture on unconditional image generation, we train separate diffusion models on three LSUN [77] classes: bedroom, horse, and cat. To evaluate classifier guidance, we train conditional diffusion models on the Image Net [59] dataset at 128 128, 256 256, and 512 512 resolution.
Dataset Splits No No explicit training, validation, or test dataset splits (percentages, counts, or detailed methodology) are provided in the main text. Appendix K is mentioned for 'all the training details'.
Hardware Specification No No specific hardware details (GPU/CPU models, memory) are provided in the main text. The paper directs readers to 'Appendix B' for 'the total amount of compute and the type of resources used'.
Software Dependencies No No specific ancillary software details with version numbers are provided in the main text. The paper mentions 'Py Torch [52]' but without a version number.
Experiment Setup Yes We train models with the above architecture changes on Image Net 128 128 and compare them on FID, evaluated at two different points of training, in Table 1. Both conditional and unconditional models were trained for 2M iterations on Image Net 256 256 with batch size 256. In the rest of the paper, we use this final improved model architecture as our default: variable width with 2 residual blocks per resolution, multiple heads with 64 channels per head, attention at 32, 16 and 8 resolutions, Big GAN residual blocks for up and downsampling, and adaptive group normalization for injecting timestep and class embeddings into residual blocks.