Efficient Training of Energy-Based Models Using Jarzynski Equality
Authors: Davide Carbone, Mengjian Hua, Simon Coste, Eric Vanden-Eijnden
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate these results with numerical experiments on Gaussian mixture distributions as well as the MNIST and CIFAR-10 datasets. |
| Researcher Affiliation | Academia | Davide Carbone Dipartimento di Scienze Matematiche, Politecnico di Torino Istituto Nazionale di Fisica Nucleare, Sezione di Torino davide.carbone@polito.it Mengjian Hua Courant Institute of Mathematical Sciences, New York University mh5113@nyu.edu Simon Coste LPSM, Université Paris-Cité simon.coste@u-paris.fr Eric Vanden-Eijnden Courant Institute of Mathematical Sciences, New York University eve2@nyu.edu |
| Pseudocode | Yes | Algorithm 1 Sequential Monte-Carlo training with Jarzynski correction |
| Open Source Code | Yes | The code used to perform these new experiments is available in the anonymized Git Hub referenced in our paper. Up to date images available at https: //github.com/submissionx12/EBMs_Jarzynski. |
| Open Datasets | Yes | Next, we perform empirical experiments on the MNIST dataset to answer the following question: when it comes to high-dimensional datasets with multiple modes, can our method produces an EBM that generates high-quality samples and captures the relative weights of the modes accurately? |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. |
| Hardware Specification | Yes | All the experiments were performed on a single A100 GPU. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers) were found. |
| Experiment Setup | Yes | The hyperparameters are the same in all cases: we take N = 4096 Langevin walkers with a mini-batch size N = 256. We use the Adam optimizer with learning rate η = 10 4 and inject a Gaussian noise of standard deviation σ = 3 10 2 to the dataset while performing gradient clipping in Langevin sampling for better performance. All the experiments were performed on a single A100 GPU. Training for 600 epochs took about 34 hours with the PCD algorithm (w/ and w/o data augmentation) and about 36 hours with our method. |