Improved Contrastive Divergence Training of Energy-Based Models
Authors: Yilun Du, Shuang Li, Joshua Tenenbaum, Igor Mordatch
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform empirical experiments to validate the following set of questions: (1) What are the effects of each proposed component towards training EBMs? (2) Are our trained EBMs able to perform well on downstream applications of EBMs, such as image generation, out-of-distribution detection, and concept compositionality? and Table 1: Table of Inception and FID scores for generations of CIFAR-10, Celeb A-HQ and Image Net32x32 images. All others numbers are taken directly from corresponding papers. On CIFAR-10, our approach outperforms past EBM approaches and achieves performance close to SNGAN. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2Google Brain. Correspondence to: Yilun Du <yilundu@mit.edu>. |
| Pseudocode | Yes | Algorithm 1 EBM training algorithm and Algorithm 2 EBM sampling algorithm |
| Open Source Code | Yes | 1Project page and code: https://energy-based-model.github.io/improved-contrastive-divergence/ |
| Open Datasets | Yes | We evaluate our approach on CIFAR-10, Imagenet 32x32 (Deng et al., 2009), and Celeb A-HQ (Karras et al., 2017) datasets. |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) was explicitly provided for training, validation, or test sets in the main text. The paper mentions using standard datasets like CIFAR-10, Imagenet 32x32, and Celeb A-HQ, which typically have predefined splits, but doesn't state them. |
| Hardware Specification | Yes | Models are trained using the Adam Optimizer (Kingma & Ba, 2015), on a single 32GB Volta GPU for CIFAR-10 for 1 day, and for 3 days on 8 32GB Volta GPUs for Celeba HQ, LSUN and Image Net 32x32 datasets. |
| Software Dependencies | No | No specific ancillary software details with version numbers (e.g., Python, PyTorch, or other libraries) were provided. The paper only mentions 'Adam Optimizer (Kingma & Ba, 2015)' without a version number or other software dependencies. |
| Experiment Setup | Yes | Models are trained using the Adam Optimizer (Kingma & Ba, 2015)... We use a buffer size of 10000, with a resampling rate of 0.1%. Our approach is significantly more stable than IGEBM, allowing us to remove aspects of regularization in (Du & Mordatch, 2019). We remove the clipping of gradients in Langevin sampling as well as spectral normalization on the weights of the network. In addition, we add self-attention blocks and layer normalization blocks in residual networks of our trained models. |