reproducibilityindex.ai

Diffusion Models Beat GANs on Image Synthesis

Authors: Prafulla Dhariwal, Alexander Nichol

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We achieve an FID of 2.97 on Image Net 128 128, 4.59 on Image Net 256 256, and 7.72 on Image Net 512 512, and we match Big GAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Table 4 summarizes our results. ADM refers to our ablated diffusion model, and ADM-G additionally uses classiﬁer guidance. Our diffusion models can obtain the best FID on each task, and the best s FID on all but one task.
Researcher Affiliation	Industry	Prafulla Dhariwal Open AI prafulla@openai.com Alex Nichol Open AI alex@openai.com
Pseudocode	Yes	Algorithm 1 Classiﬁer guided diffusion sampling, given a diffusion model (µ (xt), (xt)), classiﬁer pφ(y\|xt), and gradient scale s. Algorithm 2 Classiﬁer guided DDIM sampling, given a diffusion model (xt), classiﬁer pφ(y\|xt), and gradient scale s.
Open Source Code	Yes	We include the source code and instructions for running it. The code is included in the supplemental material.
Open Datasets	Yes	We train models with the above architecture changes on Image Net 128 128 and compare them on FID, evaluated at two different points of training, in Table 1. To evaluate our improved model architecture on unconditional image generation, we train separate diffusion models on three LSUN [77] classes: bedroom, horse, and cat. To evaluate classiﬁer guidance, we train conditional diffusion models on the Image Net [59] dataset at 128 128, 256 256, and 512 512 resolution.
Dataset Splits	No	No explicit training, validation, or test dataset splits (percentages, counts, or detailed methodology) are provided in the main text. Appendix K is mentioned for 'all the training details'.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory) are provided in the main text. The paper directs readers to 'Appendix B' for 'the total amount of compute and the type of resources used'.
Software Dependencies	No	No specific ancillary software details with version numbers are provided in the main text. The paper mentions 'Py Torch [52]' but without a version number.
Experiment Setup	Yes	We train models with the above architecture changes on Image Net 128 128 and compare them on FID, evaluated at two different points of training, in Table 1. Both conditional and unconditional models were trained for 2M iterations on Image Net 256 256 with batch size 256. In the rest of the paper, we use this ﬁnal improved model architecture as our default: variable width with 2 residual blocks per resolution, multiple heads with 64 channels per head, attention at 32, 16 and 8 resolutions, Big GAN residual blocks for up and downsampling, and adaptive group normalization for injecting timestep and class embeddings into residual blocks.