reproducibilityindex.ai

Curvature-corrected learning dynamics in deep neural networks

Authors: Dongsung Huh

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To test the main theoretical results, we conducted a simple synthetic data experiment
Researcher Affiliation	Collaboration	MIT-IBM Watson AI Lab, Cambridge, Massachusetts, USA.
Pseudocode	No	The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	To test the main theoretical results, we conducted a simple synthetic data experiment, in which the training and the testing datasets are generated from a random teacher network as yµ = wteacherxµ + zµ, where xµ RN is the whitened input data, yµ RN is the output, zµ RN is the noise (Lampinen & Ganguli, 2018).
Dataset Splits	No	The paper mentions 'training and the testing datasets' but does not specify exact split percentages, absolute sample counts, or explicit mention of a validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup	Yes	The student network is trained from small random initial weights. Hessian+ blocks are computed as described in Bernacchia et al. (2018); Botev et al. (2017) and combined to obtain full Hessian+. NGD-d and NGD-d only used the diagonal blocks. Numerical pseudo-inverses (and sqrt-inverses) are computed via singular value decomposition (SVD). For numerical stability, NGD and NGD-d used Levenberg Marquardt damping of ϵ = 10 5 and update-speed clipping. The input-output map of the teacher network wteacher RN N has a low-rank structure (rank 3, Fig 4A) and the student is a depth d = 4 linear network of constant width N = 16. The number of training dataset {xµ, yµ}P µ=1 is set to be P = N.