Diffusion Models Beat GANs on Image Synthesis
Authors: Prafulla Dhariwal, Alexander Nichol
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve an FID of 2.97 on Image Net 128 128, 4.59 on Image Net 256 256, and 7.72 on Image Net 512 512, and we match Big GAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Table 4 summarizes our results. ADM refers to our ablated diffusion model, and ADM-G additionally uses classifier guidance. Our diffusion models can obtain the best FID on each task, and the best s FID on all but one task. |
| Researcher Affiliation | Industry | Prafulla Dhariwal Open AI prafulla@openai.com Alex Nichol Open AI alex@openai.com |
| Pseudocode | Yes | Algorithm 1 Classifier guided diffusion sampling, given a diffusion model (µ (xt), (xt)), classifier pφ(y|xt), and gradient scale s. Algorithm 2 Classifier guided DDIM sampling, given a diffusion model (xt), classifier pφ(y|xt), and gradient scale s. |
| Open Source Code | Yes | We include the source code and instructions for running it. The code is included in the supplemental material. |
| Open Datasets | Yes | We train models with the above architecture changes on Image Net 128 128 and compare them on FID, evaluated at two different points of training, in Table 1. To evaluate our improved model architecture on unconditional image generation, we train separate diffusion models on three LSUN [77] classes: bedroom, horse, and cat. To evaluate classifier guidance, we train conditional diffusion models on the Image Net [59] dataset at 128 128, 256 256, and 512 512 resolution. |
| Dataset Splits | No | No explicit training, validation, or test dataset splits (percentages, counts, or detailed methodology) are provided in the main text. Appendix K is mentioned for 'all the training details'. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory) are provided in the main text. The paper directs readers to 'Appendix B' for 'the total amount of compute and the type of resources used'. |
| Software Dependencies | No | No specific ancillary software details with version numbers are provided in the main text. The paper mentions 'Py Torch [52]' but without a version number. |
| Experiment Setup | Yes | We train models with the above architecture changes on Image Net 128 128 and compare them on FID, evaluated at two different points of training, in Table 1. Both conditional and unconditional models were trained for 2M iterations on Image Net 256 256 with batch size 256. In the rest of the paper, we use this final improved model architecture as our default: variable width with 2 residual blocks per resolution, multiple heads with 64 channels per head, attention at 32, 16 and 8 resolutions, Big GAN residual blocks for up and downsampling, and adaptive group normalization for injecting timestep and class embeddings into residual blocks. |