Improved Denoising Diffusion Probabilistic Models

Authors: Alexander Quinn Nichol, Prafulla Dhariwal

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. Additionally, we find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution. Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable. We release our code and pre-trained models at https://github.com/openai/ improved-diffusion.
Researcher Affiliation Industry 1Open AI, San Francisco, USA. Correspondence to: Alex Nichol <alex@openai.com>, Prafulla Dhariwal <prafulla@openai.com>.
Pseudocode No The paper describes various processes and algorithms using mathematical formulations and descriptive text, but it does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes We release our code and pre-trained models at https://github.com/openai/ improved-diffusion.
Open Datasets Yes We train fixed model architectures with fixed hyperparameters on the Image Net 64 64 (van den Oord et al., 2016) and CIFAR-10 (Krizhevsky, 2009) datasets.
Dataset Splits Yes We train fixed model architectures with fixed hyperparameters on the Image Net 64 64 (van den Oord et al., 2016) and CIFAR-10 (Krizhevsky, 2009) datasets. Figure 10 shows validation NLL throughout training on Image Net 64 64 for different model sizes.
Hardware Specification No The paper does not explicitly describe the specific hardware used for its experiments, such as GPU models, CPU types, or detailed computing cluster specifications. It only generally refers to sampling taking "several minutes on a modern GPU".
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. It focuses on the algorithmic modifications and their empirical results.
Experiment Setup Yes For our experiments, we set λ = 0.001 to prevent Lvlb from overwhelming Lsimple. For the remainder of this section, we use T = 4000. To change model capacity, we apply a depth multiplier across all layers, such that the first layer has either 64, 96, 128, or 192 channels. ...we scale the Adam (Kingma & Ba, 2014) learning rate for each model by 1/ channel multiplier, such that the 128 channel model has a learning rate of 0.0001 (as in our other experiments).