reproducibilityindex.ai

Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

Authors: Tobias Schröder, Zijing Ou, Jen Lim, Yingzhen Li, Sebastian Vollmer, Andrew Duncan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.
Researcher Affiliation	Academia	1Imperial College London 2University of Warwick 3DFKI and RPTU Kaiserslautern 4The Alan Turing Institute
Pseudocode	Yes	Algorithm 1 CD-LEBM, Algorithm 2 SM-LEBM, Algorithm 3 ED-LEBM, Algorithm 4 Learning latent space energy-based prior models
Open Source Code	Yes	Code: https://github.com/J-zin/energy-discrepancy
Open Datasets	Yes	SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky et al., 2009), and Celeb A (Liu et al., 2015). MNIST dataset
Dataset Splits	Yes	SVHN is of resolution 32 × 32, and containts 73, 257 training images and 26, 032 test images. CIFAR-10 consists of 50, 000 training images and 10, 000 test images with a resolution of 32 × 32. For Celeb A, which contains 162, 770 training images and 19, 962 test images, we follow the pre-processing step in (Pang et al., 2020), taking 40, 000 examples of Celeb A as training data and resizing it to 64 × 64.
Hardware Specification	Yes	We choose the largest batch size from {128, 256, 512} such that it can be trained on a single NVIDIA-Ge Force-RTX-2080-Ti GPU.
Software Dependencies	No	The paper mentions using the Adam optimizer, but does not provide specific version numbers for software libraries or frameworks like PyTorch or TensorFlow, nor Python versions.
Experiment Setup	Yes	The proposed models are trained for 200 epochs using the Adam optimizer (Kingma & Ba, 2015) with a fixed learning 0.0001 for the generator and 0.00005 for the EBM prior. We choose the largest batch size from {128, 256, 512}...For the posterior sampling during training, we use the Langevin sampler with step size of 0.1 and run it for 20 steps for SVHN and Celeb A, and 40 steps on CIFAR-10. We set t = 0.25, M = 16, w = 1 throughout the experiments.