reproducibilityindex.ai

Improved Denoising Diffusion Probabilistic Models

Authors: Alexander Quinn Nichol, Prafulla Dhariwal

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that with a few simple modiﬁcations, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. Additionally, we ﬁnd that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution. Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable. We release our code and pre-trained models at https://github.com/openai/ improved-diffusion.
Researcher Affiliation	Industry	1Open AI, San Francisco, USA. Correspondence to: Alex Nichol <alex@openai.com>, Prafulla Dhariwal <prafulla@openai.com>.
Pseudocode	No	The paper describes various processes and algorithms using mathematical formulations and descriptive text, but it does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	We release our code and pre-trained models at https://github.com/openai/ improved-diffusion.
Open Datasets	Yes	We train ﬁxed model architectures with ﬁxed hyperparameters on the Image Net 64 64 (van den Oord et al., 2016) and CIFAR-10 (Krizhevsky, 2009) datasets.
Dataset Splits	Yes	We train ﬁxed model architectures with ﬁxed hyperparameters on the Image Net 64 64 (van den Oord et al., 2016) and CIFAR-10 (Krizhevsky, 2009) datasets. Figure 10 shows validation NLL throughout training on Image Net 64 64 for different model sizes.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for its experiments, such as GPU models, CPU types, or detailed computing cluster specifications. It only generally refers to sampling taking "several minutes on a modern GPU".
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. It focuses on the algorithmic modifications and their empirical results.
Experiment Setup	Yes	For our experiments, we set λ = 0.001 to prevent Lvlb from overwhelming Lsimple. For the remainder of this section, we use T = 4000. To change model capacity, we apply a depth multiplier across all layers, such that the ﬁrst layer has either 64, 96, 128, or 192 channels. ...we scale the Adam (Kingma & Ba, 2014) learning rate for each model by 1/ channel multiplier, such that the 128 channel model has a learning rate of 0.0001 (as in our other experiments).