Bounds all around: training energy-based models with bidirectional bounds
Authors: Cong Geng, Jia Wang, Zhiyong Gao, Jes Frellsen, Søren Hauberg
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the bounds we develop a new and efficient estimator of the Jacobi-determinant of the EBM generator. We demonstrate that these developments stabilize training and yield high-quality density estimation and sample generation. Experimentally, we demonstrate that our approach matches or surpasses state-of-the-art on diverse tasks at negligible performance increase. |
| Researcher Affiliation | Academia | Cong Geng, Jia Wang, Zhiyong Gao Shanghai Jiao Tong University {gengcong, jiawang, zhiyong.gao}@sjtu.edu.cn Jes Frellsen , Søren Hauberg Technical University of Denmark {jefr, sohau}@dtu.dk |
| Pseudocode | No | The paper describes algorithms (e.g., LOBPCG) and methods, but does not present them in a pseudocode block or a clearly labeled algorithm format. |
| Open Source Code | Yes | Our implementation is available at https://github.com/gengcong940126/EBM-BB. |
| Open Datasets | Yes | We train our generative model on the Stacked MNIST dataset... on MNIST (Le Cun, 1998)... on the standard benchmark 32x32 CIFAR-10 (Krizhevsky et al., 2009) dataset and the 64x64 cropped ANIMEFACE3 dataset... |
| Dataset Splits | No | The paper mentions training and test sets and performs cross-validation for some metrics, but it does not explicitly detail the split percentages or counts for training, validation, and test datasets in a general reproducible manner for all experiments. |
| Hardware Specification | Yes | All experiments are conducted on a single 12GB NVIDIA Titan GPU using a pytorch (Paszke et al., 2017) implementation. All models were trained on a single 12GB Titan GPU. |
| Software Dependencies | Yes | All experiments are conducted on a single 12GB NVIDIA Titan GPU using a pytorch (Paszke et al., 2017) implementation. |
| Experiment Setup | Yes | On toy data and MNIST, all models are realized with multi-layer perceptrons (MLPs), while for natural images, we use the convolutional architecture from Studio GAN (Kang and Park, 2020). Details of the network architecture are given in the supplementary material. To improve training, we use a positive margin for our energy function to balance the bounds. Specifically, we let L(θ) = L(θ) + MEx pg(x) [|x Eθ(x) + x log pg(x)|p]+ζ where []+ = max(0, ) is the usual hinge. In most experiments, ζ = 1 as this allows us to apply the simplified bound in Equation (15). But in general we recommend starting with ζ = 0 and trying successively larger values to stabilize training. |