On Energy-Based Models with Overparametrized Shallow Neural Networks
Authors: Carles Domingo-Enrich, Alberto Bietti, Eric Vanden-Eijnden, Joan Bruna
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our study covers both maximum likelihood and Stein Discrepancy estimators, and we validate our theoretical results with numerical experiments on synthetic data. and 6. Experiments In this section, we present numerical experiments illustrating our theory on simple synthetic datasets generated by teacher models with energies f (x) = 1 J PJ j=1 w j σ( θ j , x ), with θ i Sd for all i. |
| Researcher Affiliation | Academia | 1Courant Institute of Mathematical Sciences, New York University 2Center for Data Science, New York University. |
| Pseudocode | Yes | Algorithm 1 Generic algorithm to train F1 EBMs |
| Open Source Code | Yes | The code for the experiments is in https://github.com/ CDEnrich/ebms_shallow_nn. |
| Open Datasets | No | The paper uses 'simple synthetic datasets generated by teacher models' and describes the generation process ('We generate data on the sphere Sd from teacher models by using a simple rejection sampling strategy'), but does not provide access information (link, DOI, repository, or formal citation) for a publicly available or open dataset. |
| Dataset Splits | Yes | We report test metrics after selecting hyperparameters on a validation set of 2000 samples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It discusses computational aspects generally but without concrete specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software, libraries, or dependencies (e.g., Python, PyTorch, TensorFlow, etc.) used to conduct the experiments, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For different numbers of training samples, we run our gradient-based algorithms in F1 and F2 with different choices of step-sizes and regularization parameters λ, using m = 500 neurons. We report test metrics after selecting hyperparameters on a validation set of 2000 samples. For computing gradients in maximum likelihood training, we use a simple Metropolis-Hastings algorithm with uniform proposals on the sphere. |