Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Efficient Training of Energy-Based Models Using Jarzynski Equality
Authors: Davide Carbone, Mengjian Hua, Simon Coste, Eric Vanden-Eijnden
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate these results with numerical experiments on Gaussian mixture distributions as well as the MNIST and CIFAR-10 datasets. |
| Researcher Affiliation | Academia | Davide Carbone Dipartimento di Scienze Matematiche, Politecnico di Torino Istituto Nazionale di Fisica Nucleare, Sezione di Torino EMAIL Mengjian Hua Courant Institute of Mathematical Sciences, New York University EMAIL Simon Coste LPSM, Université Paris-Cité EMAIL Eric Vanden-Eijnden Courant Institute of Mathematical Sciences, New York University EMAIL |
| Pseudocode | Yes | Algorithm 1 Sequential Monte-Carlo training with Jarzynski correction |
| Open Source Code | Yes | The code used to perform these new experiments is available in the anonymized Git Hub referenced in our paper. Up to date images available at https: //github.com/submissionx12/EBMs_Jarzynski. |
| Open Datasets | Yes | Next, we perform empirical experiments on the MNIST dataset to answer the following question: when it comes to high-dimensional datasets with multiple modes, can our method produces an EBM that generates high-quality samples and captures the relative weights of the modes accurately? |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. |
| Hardware Specification | Yes | All the experiments were performed on a single A100 GPU. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers) were found. |
| Experiment Setup | Yes | The hyperparameters are the same in all cases: we take N = 4096 Langevin walkers with a mini-batch size N = 256. We use the Adam optimizer with learning rate η = 10 4 and inject a Gaussian noise of standard deviation σ = 3 10 2 to the dataset while performing gradient clipping in Langevin sampling for better performance. All the experiments were performed on a single A100 GPU. Training for 600 epochs took about 34 hours with the PCD algorithm (w/ and w/o data augmentation) and about 36 hours with our method. |