Energy Discrepancies: A Score-Independent Loss for Energy-Based Models
Authors: Tobias Schröder, Zijing Ou, Jen Lim, Yingzhen Li, Sebastian Vollmer, Andrew Duncan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model. |
| Researcher Affiliation | Academia | 1Imperial College London 2University of Warwick 3DFKI and RPTU Kaiserslautern 4The Alan Turing Institute |
| Pseudocode | Yes | Algorithm 1 CD-LEBM, Algorithm 2 SM-LEBM, Algorithm 3 ED-LEBM, Algorithm 4 Learning latent space energy-based prior models |
| Open Source Code | Yes | Code: https://github.com/J-zin/energy-discrepancy |
| Open Datasets | Yes | SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky et al., 2009), and Celeb A (Liu et al., 2015). MNIST dataset |
| Dataset Splits | Yes | SVHN is of resolution 32 × 32, and containts 73, 257 training images and 26, 032 test images. CIFAR-10 consists of 50, 000 training images and 10, 000 test images with a resolution of 32 × 32. For Celeb A, which contains 162, 770 training images and 19, 962 test images, we follow the pre-processing step in (Pang et al., 2020), taking 40, 000 examples of Celeb A as training data and resizing it to 64 × 64. |
| Hardware Specification | Yes | We choose the largest batch size from {128, 256, 512} such that it can be trained on a single NVIDIA-Ge Force-RTX-2080-Ti GPU. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but does not provide specific version numbers for software libraries or frameworks like PyTorch or TensorFlow, nor Python versions. |
| Experiment Setup | Yes | The proposed models are trained for 200 epochs using the Adam optimizer (Kingma & Ba, 2015) with a fixed learning 0.0001 for the generator and 0.00005 for the EBM prior. We choose the largest batch size from {128, 256, 512}...For the posterior sampling during training, we use the Langevin sampler with step size of 0.1 and run it for 20 steps for SVHN and Celeb A, and 40 steps on CIFAR-10. We set t = 0.25, M = 16, w = 1 throughout the experiments. |