Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Authors: Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our theory with simulations on synthetic data and MNIST dataset. |
| Researcher Affiliation | Academia | 1John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA 2Department of Physics, Harvard University, Cambridge, MA, USA 3Center for Brain Science, Harvard University, Cambridge, MA, USA. Correspondence to: Cengiz Pehlevan <cpehlevan@seas.harvard.edu>. |
| Pseudocode | Yes | Algorithm 1 Computing Theoretical Learning Curves |
| Open Source Code | Yes | Code: https://github.com/Pehlevan-Group/ NTK_Learning_Curves |
| Open Datasets | Yes | We verify our theory with simulations on synthetic data and MNIST dataset. |
| Dataset Splits | No | The paper does not explicitly detail the split percentages or counts for training, validation, and test sets. It mentions a 'random test sample' but not a full split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions 'Neural-Tangents Library (Novak et al., 2020)' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Figure 3. (a) and (b) Learning curves for neural networks (NNs) on pure modes as defined in eq. (35). (c) Learning curve for the student teacher setup defined in (36). The theory curves shown as solid lines are again computed with eq. (21). The test error for the finite width neural networks and NTK are shown with dots and triangles respectively. The generalization error was estimated by taking a random test sample of 1000 data points. The average was taken over 25 trials and the standard deviations are shown with errorbars. The networks were initialized with the default Gaussian NTK parameterization (Jacot et al., 2018) and trained with stochastic gradient descent (details in SI Section 13). |