Learning Energy-Based Models by Diffusion Recovery Likelihood

Authors: Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P Kingma

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method generates high fidelity samples on various image datasets. On unconditional CIFAR-10 our method achieves FID 9.58 and inception score 8.30, superior to the majority of GANs. Moreover, we demonstrate that unlike previous work on EBMs, our long-run MCMC samples from the conditional distributions do not diverge and still represent realistic images, allowing us to accurately estimate the normalized density of data even for high-dimensional datasets. Our implementation is available at https://github.com/ruiqigao/recovery_likelihood.
Researcher Affiliation Collaboration Ruiqi Gao UCLA ruiqigao@ucla.edu Yang Song Stanford University yangsong@cs.stanford.edu Ben Poole Google Brain pooleb@google.com Ying Nian Wu UCLA ywu@stat.ucla.edu Diederik P. Kingma Google Brain durk@google.com
Pseudocode Yes Algorithm 1 Training Sample t Unif({0, ..., T 1}). Sample pairs (yt, xt+1). Set synthesized sample y t = xt+1. for τ 1 to K do Update y t according to equation 17. end for Update θ following the gradients θfθ(yt, t) θfθ(y t , t). until converged. Algorithm 2 Progressive sampling Sample x T N(0, I). for t T 1 to 0 do yt = xt+1. for τ 1 to K do Update yt according to equation 17. end for xt = yt/ q 1 σ2 t+1. end for return x0.
Open Source Code Yes Our implementation is available at https://github.com/ruiqigao/recovery_likelihood.
Open Datasets Yes We use the following datasets in our experiments: CIFAR-10 (Krizhevsky et al., 2009), Celeb A (Liu et al., 2018) and LSUN (Yu et al., 2015).
Dataset Splits No The paper mentions training and test images for CIFAR-10 (50,000 training, 10,000 test) and Celeb A (162,770 training, 19,962 test), and test images for LSUN (300 test images), but does not specify a validation dataset split or a methodology for creating one for reproduction.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper mentions using "Adam (Kingma & Ba, 2014) optimizer" and a network structure based on "Wide Res Net (Zagoruyko & Komodakis, 2016)", and "Spectral normalization (Miyato et al., 2018)" but does not provide specific software environment details with version numbers (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup Yes Training. We use Adam (Kingma & Ba, 2014) optimizer for all the experiments. We find that for high resolution images, using a smaller β1 in Adam help stabilize training. We use learning rate 0.0001 for all the experiments. For the values of β1, batch sizes and the number of training iterations for various datasets, see Table 6. Table 6: Hyperparameters of various datasets. Dataset N β1 in Adam Batch size Training iterations CIFAR-10 8 0.9 256 240k Celeb A 6 0.5 128 880k LSUN church outdoor 64^2 2 0.9 128 960k LSUN bedroom 64^2 2 0.9 128 760k LSUN church outdoor 128^2 2 0.5 64 840k LSUN bedroom 128^2 5 0.5 64 580k