reproducibilityindex.ai

Second-order regression models exhibit progressive sharpening to the edge of stability

Authors: Atish Agarwala, Fabian Pedregosa, Jeffrey Pennington

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct a numerical analysis on the properties of a real neural network and use tools from our theoretical analysis to show that edge-ofstability behavior in the wild shows some of the same patterns as the theoretical models. We conducted numerical experiments in real world models, and compare the behavior to our theory on simpliﬁed models. Following (Cohen et al., 2022a), we trained a 2-hidden layer tanh network using the squared loss on 5000 examples from CIFAR10 with learning rate 10−2 a setting which shows edge of stability behavior.
Researcher Affiliation	Industry	1Google Deep Mind. Correspondence to: Atish Agarwala <thetish@google.com>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using the 'Neural Tangents' library but does not provide concrete access to the source code for the methodology described in this paper.
Open Datasets	Yes	Following (Cohen et al., 2022a), we trained a 2-hidden layer tanh network using the squared loss on 5000 examples from CIFAR10 with learning rate 10−2 a setting which shows edge of stability behavior.
Dataset Splits	No	The paper mentions using '5000 examples from CIFAR10' but does not specify the training, validation, or test dataset splits needed to reproduce the experiment.
Hardware Specification	No	The paper states 'All experiments were conducted on GPU with float32 precision' but does not provide specific hardware details such as exact GPU/CPU models or processor types.
Software Dependencies	No	The paper mentions using the 'Neural Tangents library' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup	Yes	we trained a 2-hidden layer tanh network using the squared loss on 5000 examples from CIFAR10 with learning rate 10−2 (Section 5) and Models were 2-hidden layer fully-connected networks, with hidden width 256 and Erf non-linearities. Models were initialized with the NTK parameterization, with weight variance 1 and bias variance 0... A learning rate of 0.003204 was used in all experiments. All plots were made using float-64 precision. (Appendix D.3).