Generalized Energy Based Models
Authors: Michael Arbel, Liang Zhou, Arthur Gretton
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, the GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity. When using normalizing flows as base measures, GEBMs succeed on density modelling tasks, returning comparable performance to direct maximum likelihood of the same networks. ... Finally, experimental results are presented in Section 6 with code available at https://github.com/MichaelArbel/GeneralizedEBM. |
| Researcher Affiliation | Academia | Michael Arbel , Liang Zhou & Arthur Gretton Gatsby Computational Neuroscience Unit, University College London |
| Pseudocode | Yes | Algorithm 1 Training GEBM ... Algorithm 2 Overdamped Langevin Algorithm ... Algorithm 3 Kinetic Langevin Algorithm |
| Open Source Code | Yes | Finally, experimental results are presented in Section 6 with code available at https://github.com/MichaelArbel/GeneralizedEBM. |
| Open Datasets | Yes | We consider CIFAR-10 (Krizhevsky, 2009), LSUN (Yu et al., 2015), Celeb A (Liu et al., 2015) and Image Net (Russakovsky et al., 2014) all downsampled to 32x32 resolution to reduce computational cost. ... For Red Wine and White Wine, we added uniform noise... For Hepmass and Mini Boone, we removed ill-conditioned dimensions as also done in Papamakarios et al. (2017). We split all datasets, except Hep Mass into three splits. |
| Dataset Splits | Yes | We split all datasets, except Hep Mass into three splits. The test split consists of 10% of the total data. For the validation set, we use 10% of the remaining data with an upper limit of 1000 to reduce the cost of validation at each iteration. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" but does not specify its version or any other software dependencies (e.g., programming language versions, specific libraries like PyTorch or TensorFlow versions). |
| Experiment Setup | Yes | We train the models for 150000 generator iterations using Algorithm 1. After training is completed, we rescale the energy by β =100 to get a colder version of the GEBM and sample from it using either Algorithm 2 (ULA) or Algorithm 3 (KLA) with parameters (γ =100,u=1). ... We perform 1000 MCMC iterations with initial step-size of λ=10 4 decreased by 10 every 200 iterations. ... We train both base and energy by alternating 5 gradient steps to learn the energy vs 1 gradient step to learn the base. For the first two gradient iterations and after every 500 gradient iterations on base, we train the energy for 100 gradient steps instead of 5. We then train the model up to 150000 gradient iterations on the base using a batch-size of 128 and Adam optimizer Kingma and Ba (2014) with initial learning rate of 10 4 and parameters (0.5,.999) for both energy and base. |