reproducibilityindex.ai

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Authors: Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify our theory with simulations on synthetic data and MNIST dataset.
Researcher Affiliation	Academia	1John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA 2Department of Physics, Harvard University, Cambridge, MA, USA 3Center for Brain Science, Harvard University, Cambridge, MA, USA. Correspondence to: Cengiz Pehlevan <cpehlevan@seas.harvard.edu>.
Pseudocode	Yes	Algorithm 1 Computing Theoretical Learning Curves
Open Source Code	Yes	Code: https://github.com/Pehlevan-Group/ NTK_Learning_Curves
Open Datasets	Yes	We verify our theory with simulations on synthetic data and MNIST dataset.
Dataset Splits	No	The paper does not explicitly detail the split percentages or counts for training, validation, and test sets. It mentions a 'random test sample' but not a full split.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models or memory.
Software Dependencies	No	The paper mentions 'Neural-Tangents Library (Novak et al., 2020)' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Figure 3. (a) and (b) Learning curves for neural networks (NNs) on pure modes as deﬁned in eq. (35). (c) Learning curve for the student teacher setup deﬁned in (36). The theory curves shown as solid lines are again computed with eq. (21). The test error for the ﬁnite width neural networks and NTK are shown with dots and triangles respectively. The generalization error was estimated by taking a random test sample of 1000 data points. The average was taken over 25 trials and the standard deviations are shown with errorbars. The networks were initialized with the default Gaussian NTK parameterization (Jacot et al., 2018) and trained with stochastic gradient descent (details in SI Section 13).