Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Geometry of Neural Network Loss Surfaces via Random Matrix Theory
Authors: Jeffrey Pennington, Yasaman Bahri
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We conduct large-scale experiments to examine the distribution of critical points and compare with our theoretical predictions. |
| Researcher Affiliation | Industry | Jeffrey Pennington 1 Yasaman Bahri 1 1Google Brain. Correspondence to: Jeffrey Pennington <EMAIL>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about the release of source code for the methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Data is for a trained single-hidden-layer Re LU autoencoding network with 20 hidden units and no biases on 150 4 4 downsampled, grayscaled, whitened CIFAR-10 images. Dataset was taken from 4 4 downsampled, grayscaled, whitened CIFAR10 images. |
| Dataset Splits | No | The paper mentions using random sampling for data but does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library or solver names with version numbers. |
| Experiment Setup | Yes | We train single-hidden-layer tanh networks of size n = 16, which also equals the input and output dimensionality. For each training run, the data and targets are randomly sampled from standard normal distributions, which makes this a kind of memorization task. [...] First we optimize the network with standard gradient descent until the loss reaches a random value between 0 and the initial loss. From that point on, we switch to minimizing a new objective, Jg = | θL|2, which, unlike the primary objective, is attracted to saddle points. Gradient descent on Jg only requires the computation of Hessian-vector products and can be executed efficiently. We discard any run for which the final Jg > 10 6; otherwise we record the final energy and index. |