Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Authors: Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our theory with simulations on synthetic data and MNIST dataset.
Researcher Affiliation Academia 1John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA 2Department of Physics, Harvard University, Cambridge, MA, USA 3Center for Brain Science, Harvard University, Cambridge, MA, USA. Correspondence to: Cengiz Pehlevan <cpehlevan@seas.harvard.edu>.
Pseudocode Yes Algorithm 1 Computing Theoretical Learning Curves
Open Source Code Yes Code: https://github.com/Pehlevan-Group/ NTK_Learning_Curves
Open Datasets Yes We verify our theory with simulations on synthetic data and MNIST dataset.
Dataset Splits No The paper does not explicitly detail the split percentages or counts for training, validation, and test sets. It mentions a 'random test sample' but not a full split.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models or memory.
Software Dependencies No The paper mentions 'Neural-Tangents Library (Novak et al., 2020)' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Figure 3. (a) and (b) Learning curves for neural networks (NNs) on pure modes as defined in eq. (35). (c) Learning curve for the student teacher setup defined in (36). The theory curves shown as solid lines are again computed with eq. (21). The test error for the finite width neural networks and NTK are shown with dots and triangles respectively. The generalization error was estimated by taking a random test sample of 1000 data points. The average was taken over 25 trials and the standard deviations are shown with errorbars. The networks were initialized with the default Gaussian NTK parameterization (Jacot et al., 2018) and trained with stochastic gradient descent (details in SI Section 13).