reproducibilityindex.ai

Geometry of Neural Network Loss Surfaces via Random Matrix Theory

Authors: Jeffrey Pennington, Yasaman Bahri

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We conduct large-scale experiments to examine the distribution of critical points and compare with our theoretical predictions.
Researcher Affiliation	Industry	Jeffrey Pennington 1 Yasaman Bahri 1 1Google Brain. Correspondence to: Jeffrey Pennington <jpennin@google.com>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about the release of source code for the methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Data is for a trained single-hidden-layer Re LU autoencoding network with 20 hidden units and no biases on 150 4 4 downsampled, grayscaled, whitened CIFAR-10 images. Dataset was taken from 4 4 downsampled, grayscaled, whitened CIFAR10 images.
Dataset Splits	No	The paper mentions using random sampling for data but does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or detailed computer specifications used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library or solver names with version numbers.
Experiment Setup	Yes	We train single-hidden-layer tanh networks of size n = 16, which also equals the input and output dimensionality. For each training run, the data and targets are randomly sampled from standard normal distributions, which makes this a kind of memorization task. [...] First we optimize the network with standard gradient descent until the loss reaches a random value between 0 and the initial loss. From that point on, we switch to minimizing a new objective, Jg = \| θL\|2, which, unlike the primary objective, is attracted to saddle points. Gradient descent on Jg only requires the computation of Hessian-vector products and can be executed efﬁciently. We discard any run for which the ﬁnal Jg > 10 6; otherwise we record the ﬁnal energy and index.