Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Curvature-corrected learning dynamics in deep neural networks
Authors: Dongsung Huh
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test the main theoretical results, we conducted a simple synthetic data experiment |
| Researcher Affiliation | Collaboration | MIT-IBM Watson AI Lab, Cambridge, Massachusetts, USA. |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | To test the main theoretical results, we conducted a simple synthetic data experiment, in which the training and the testing datasets are generated from a random teacher network as yµ = wteacherxµ + zµ, where xµ RN is the whitened input data, yµ RN is the output, zµ RN is the noise (Lampinen & Ganguli, 2018). |
| Dataset Splits | No | The paper mentions 'training and the testing datasets' but does not specify exact split percentages, absolute sample counts, or explicit mention of a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers). |
| Experiment Setup | Yes | The student network is trained from small random initial weights. Hessian+ blocks are computed as described in Bernacchia et al. (2018); Botev et al. (2017) and combined to obtain full Hessian+. NGD-d and NGD-d only used the diagonal blocks. Numerical pseudo-inverses (and sqrt-inverses) are computed via singular value decomposition (SVD). For numerical stability, NGD and NGD-d used Levenberg Marquardt damping of ϵ = 10 5 and update-speed clipping. The input-output map of the teacher network wteacher RN N has a low-rank structure (rank 3, Fig 4A) and the student is a depth d = 4 linear network of constant width N = 16. The number of training dataset {xµ, yµ}P µ=1 is set to be P = N. |