Learning Energy-Based Models by Diffusion Recovery Likelihood
Authors: Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P Kingma
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method generates high fidelity samples on various image datasets. On unconditional CIFAR-10 our method achieves FID 9.58 and inception score 8.30, superior to the majority of GANs. Moreover, we demonstrate that unlike previous work on EBMs, our long-run MCMC samples from the conditional distributions do not diverge and still represent realistic images, allowing us to accurately estimate the normalized density of data even for high-dimensional datasets. Our implementation is available at https://github.com/ruiqigao/recovery_likelihood. |
| Researcher Affiliation | Collaboration | Ruiqi Gao UCLA ruiqigao@ucla.edu Yang Song Stanford University yangsong@cs.stanford.edu Ben Poole Google Brain pooleb@google.com Ying Nian Wu UCLA ywu@stat.ucla.edu Diederik P. Kingma Google Brain durk@google.com |
| Pseudocode | Yes | Algorithm 1 Training Sample t Unif({0, ..., T 1}). Sample pairs (yt, xt+1). Set synthesized sample y t = xt+1. for τ 1 to K do Update y t according to equation 17. end for Update θ following the gradients θfθ(yt, t) θfθ(y t , t). until converged. Algorithm 2 Progressive sampling Sample x T N(0, I). for t T 1 to 0 do yt = xt+1. for τ 1 to K do Update yt according to equation 17. end for xt = yt/ q 1 σ2 t+1. end for return x0. |
| Open Source Code | Yes | Our implementation is available at https://github.com/ruiqigao/recovery_likelihood. |
| Open Datasets | Yes | We use the following datasets in our experiments: CIFAR-10 (Krizhevsky et al., 2009), Celeb A (Liu et al., 2018) and LSUN (Yu et al., 2015). |
| Dataset Splits | No | The paper mentions training and test images for CIFAR-10 (50,000 training, 10,000 test) and Celeb A (162,770 training, 19,962 test), and test images for LSUN (300 test images), but does not specify a validation dataset split or a methodology for creating one for reproduction. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions using "Adam (Kingma & Ba, 2014) optimizer" and a network structure based on "Wide Res Net (Zagoruyko & Komodakis, 2016)", and "Spectral normalization (Miyato et al., 2018)" but does not provide specific software environment details with version numbers (e.g., Python version, PyTorch/TensorFlow version, CUDA version). |
| Experiment Setup | Yes | Training. We use Adam (Kingma & Ba, 2014) optimizer for all the experiments. We find that for high resolution images, using a smaller β1 in Adam help stabilize training. We use learning rate 0.0001 for all the experiments. For the values of β1, batch sizes and the number of training iterations for various datasets, see Table 6. Table 6: Hyperparameters of various datasets. Dataset N β1 in Adam Batch size Training iterations CIFAR-10 8 0.9 256 240k Celeb A 6 0.5 128 880k LSUN church outdoor 64^2 2 0.9 128 960k LSUN bedroom 64^2 2 0.9 128 760k LSUN church outdoor 128^2 2 0.5 64 840k LSUN bedroom 128^2 5 0.5 64 580k |