Implicit Generation and Modeling with Energy Based Models

Authors: Yilun Du, Igor Mordatch

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present techniques to scale MCMC based EBM training on continuous neural networks, and we show its success on the high-dimensional data domains of Image Net32x32, Image Net128x128, CIFAR-10, and robotic hand trajectories, achieving better samples than other likelihood models and nearing the performance of contemporary GAN approaches, while covering all modes of the data. We empirically demonstrate its effectiveness on a series of tasks, including image modeling, trajectory modeling, out-of-distribution detection, continual learning, adversarial robustness and compositional generation.
Researcher Affiliation Collaboration Yilun Du MIT CSAIL Igor Mordatch Google Brain Work done at Open AI Correspondence to: yilundu@mit.edu
Pseudocode Yes Algorithm 1 Energy training algorithm
Open Source Code Yes Additional results, source code, and pre-trained models are available at https://sites.google.com/view/igebm
Open Datasets Yes Image Net32x32, Image Net128x128, CIFAR-10, d Sprites dataset [Higgins et al., 2017], Split MNIST task proposed in [Farquhar and Gal, 2018]
Dataset Splits No We generated 200,000 different trajectories of length 100, from a trained policy (with every 4th action set to a random action for diversity), with a 90-10 train-test split. The paper mentions train-test splits for some datasets but does not explicitly state a validation split for any experiment.
Hardware Specification No The paper mentions training a smaller network due to computational constraints but does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes In all our experiments, we sample from B 95% of the time and from uniform noise otherwise. We ran 20 steps of PGD as in [Madry et al., 2017], on the above logits. To undergo classification, we then ran 10 steps sampling initialized from the starting image (with a bounded deviation of 0.03) from each conditional model, and then classified using the lowest energy conditional class. We train a conditional EBM with 2 layers of 400 hidden units work and compare with a generative conditional VAE baseline with both encoder/decoder having 2 layers of 400 hidden units.