On Energy-Based Models with Overparametrized Shallow Neural Networks

Authors: Carles Domingo-Enrich, Alberto Bietti, Eric Vanden-Eijnden, Joan Bruna

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our study covers both maximum likelihood and Stein Discrepancy estimators, and we validate our theoretical results with numerical experiments on synthetic data. and 6. Experiments In this section, we present numerical experiments illustrating our theory on simple synthetic datasets generated by teacher models with energies f (x) = 1 J PJ j=1 w j σ( θ j , x ), with θ i Sd for all i.
Researcher Affiliation Academia 1Courant Institute of Mathematical Sciences, New York University 2Center for Data Science, New York University.
Pseudocode Yes Algorithm 1 Generic algorithm to train F1 EBMs
Open Source Code Yes The code for the experiments is in https://github.com/ CDEnrich/ebms_shallow_nn.
Open Datasets No The paper uses 'simple synthetic datasets generated by teacher models' and describes the generation process ('We generate data on the sphere Sd from teacher models by using a simple rejection sampling strategy'), but does not provide access information (link, DOI, repository, or formal citation) for a publicly available or open dataset.
Dataset Splits Yes We report test metrics after selecting hyperparameters on a validation set of 2000 samples.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It discusses computational aspects generally but without concrete specifications.
Software Dependencies No The paper does not provide specific version numbers for any software, libraries, or dependencies (e.g., Python, PyTorch, TensorFlow, etc.) used to conduct the experiments, which are necessary for full reproducibility.
Experiment Setup Yes For different numbers of training samples, we run our gradient-based algorithms in F1 and F2 with different choices of step-sizes and regularization parameters λ, using m = 500 neurons. We report test metrics after selecting hyperparameters on a validation set of 2000 samples. For computing gradients in maximum likelihood training, we use a simple Metropolis-Hastings algorithm with uniform proposals on the sphere.