Learning Curves for Deep Structured Gaussian Feature Models
Authors: Jacob Zavatone-Veth, Cengiz Pehlevan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 2: Generalization for power-law spectra. (a). Target-averaged generalization error as a function of training data density 1/ 0 for shallow models (L = 1) of varying hidden layer width 1/ 0 in the absence of label noise ( = 0). Here, the data and weight spectra have identical power law decay !0 = !1 = 1. (b). As in (a), but in the presence of label noise ( = 1/2). (c). As in (b), but for fixed hidden layer width 1/ 0 = 4, fixed data exponent !0 = 1, and varying weight exponents !1. In all cases, solid lines show the predictions of (31), while dots with error bars show the mean and standard error over 100 realizations of numerical experiments with n0 = 1000. See Appendix F for details of our numerical methods. F Numerical methods In this appendix, we describe the numerical methods used to produce Figures 1, 2. All simulations were performed using MATLAB 9.13 (R2022b; The Math Works, Natick MA, USA; https://www.mathworks.com/products/matlab.html) on a desktop workstation (CPU: Intel Xeon W-2145, 64GB RAM). They were not computationally intensive, and required less than an hour of compute time in total. Code to reproduce the figures is archived as part of the online supplemental material. Numerical computation of the solution to the ridgeless regression problem the minimum-norm interpolant was performed using the lsqminnorm solver (https://www.mathworks.com/help/ matlab/ref/lsqminnorm.html), which uses an algorithm based on the complete orthogonal decomposition of the design matrix. |
| Researcher Affiliation | Academia | Jacob A. Zavatone-Veth1,2 and Cengiz Pehlevan3,2,4 1Department of Physics, 2Center for Brain Science, 3John A. Paulson School of Engineering and Applied Sciences, 4Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University Cambridge, MA 02138, USA jzavatoneveth@g.harvard.edu, cpehlevan@seas.harvard.edu |
| Pseudocode | No | The paper describes mathematical derivations and numerical methods but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce the figures is archived as part of the online supplemental material. |
| Open Datasets | No | The paper generates synthetic data: 'We generate training datasets according to a structured Gaussian covariate model, with p i.i.d. training examples (xµ, yµ) generated as xµ i.i.d. N(0, Σ0), yµ = 1 n0 w xµ + ξµ'. It does not use or provide access to a publicly available dataset. |
| Dataset Splits | No | The paper describes generating 'training datasets' but does not specify explicit training, validation, or test splits for data. It discusses 'training data' in the context of theoretical analysis and numerical experiments, but not with reproducibility-focused split details. |
| Hardware Specification | Yes | All simulations were performed using MATLAB 9.13 (R2022b; The Math Works, Natick MA, USA; https://www.mathworks.com/products/matlab.html) on a desktop workstation (CPU: Intel Xeon W-2145, 64GB RAM). |
| Software Dependencies | Yes | All simulations were performed using MATLAB 9.13 (R2022b; The Math Works, Natick MA, USA; https://www.mathworks.com/products/matlab.html). |
| Experiment Setup | Yes | Dots with error bars show the mean and standard error over 100 realizations of numerical experiments with n0 = 1000. Here, the data and weight spectra have identical power law decay ω0 = ω1 = 1. As in (a), but in the presence of label noise (η = 1/2). Numerical computation of the solution to the ridgeless regression problem the minimum-norm interpolant was performed using the lsqminnorm solver. |