reproducibilityindex.ai

Improved Contrastive Divergence Training of Energy-Based Models

Authors: Yilun Du, Shuang Li, Joshua Tenenbaum, Igor Mordatch

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform empirical experiments to validate the following set of questions: (1) What are the effects of each proposed component towards training EBMs? (2) Are our trained EBMs able to perform well on downstream applications of EBMs, such as image generation, out-of-distribution detection, and concept compositionality? and Table 1: Table of Inception and FID scores for generations of CIFAR-10, Celeb A-HQ and Image Net32x32 images. All others numbers are taken directly from corresponding papers. On CIFAR-10, our approach outperforms past EBM approaches and achieves performance close to SNGAN.
Researcher Affiliation	Collaboration	1MIT CSAIL 2Google Brain. Correspondence to: Yilun Du <yilundu@mit.edu>.
Pseudocode	Yes	Algorithm 1 EBM training algorithm and Algorithm 2 EBM sampling algorithm
Open Source Code	Yes	1Project page and code: https://energy-based-model.github.io/improved-contrastive-divergence/
Open Datasets	Yes	We evaluate our approach on CIFAR-10, Imagenet 32x32 (Deng et al., 2009), and Celeb A-HQ (Karras et al., 2017) datasets.
Dataset Splits	No	No specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) was explicitly provided for training, validation, or test sets in the main text. The paper mentions using standard datasets like CIFAR-10, Imagenet 32x32, and Celeb A-HQ, which typically have predefined splits, but doesn't state them.
Hardware Specification	Yes	Models are trained using the Adam Optimizer (Kingma & Ba, 2015), on a single 32GB Volta GPU for CIFAR-10 for 1 day, and for 3 days on 8 32GB Volta GPUs for Celeba HQ, LSUN and Image Net 32x32 datasets.
Software Dependencies	No	No specific ancillary software details with version numbers (e.g., Python, PyTorch, or other libraries) were provided. The paper only mentions 'Adam Optimizer (Kingma & Ba, 2015)' without a version number or other software dependencies.
Experiment Setup	Yes	Models are trained using the Adam Optimizer (Kingma & Ba, 2015)... We use a buffer size of 10000, with a resampling rate of 0.1%. Our approach is significantly more stable than IGEBM, allowing us to remove aspects of regularization in (Du & Mordatch, 2019). We remove the clipping of gradients in Langevin sampling as well as spectral normalization on the weights of the network. In addition, we add self-attention blocks and layer normalization blocks in residual networks of our trained models.