On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models
Authors: Peizhong Ju, Xiaojun Lin, Ness Shroff
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1. The test mean-square-error(MSE) vs. the number of features/neurons p for (a) learnable function and (b) not-learnable function when n = 50, d = 2, ϵ 2 2 = 0.01. The corresponding ground-truth are (a) f(θ) = P k {0,1,2,4}(sin(kθ) + cos(kθ)), and (b) f(θ) = P k {3,5,7,9}(sin(kθ) + cos(kθ)). (Note that in 2-dimension every input x on a unit circle can be represented by an angle θ [ π, π]. See the end of Section 4.) Every curve is the average of 9 random simulation runs. Figure 3. The test MSE of the overfitted NTK model for the same ground-truth function as Fig. 1(a). (a) We fix n = 50 and increase p for different noise level σ2. (b) We fix p = 20000 and increase n. All data points in this figure are the average of five random simulation runs. |
| Researcher Affiliation | Academia | 1School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana, USA. 2Department of ECE and CSE, The Ohio State University, Columbus, Ohio, USA. |
| Pseudocode | No | The paper describes mathematical concepts and derivations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper describes the generation of synthetic data for experiments ('the inputs x are i.i.d. uniformly distributed in Sd 1, and the initial weights V0[j] s are i.i.d. uniformly distributed in all directions in Rd;') but does not provide concrete access information (link, DOI, specific citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper states 'n training samples' and evaluates 'test error' or 'test MSE' but does not specify explicit training/validation/test dataset splits (e.g., percentages, sample counts, or methodology for splitting). |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). |
| Experiment Setup | Yes | Figure 1. The test mean-square-error(MSE) vs. the number of features/neurons p for (a) learnable function and (b) not-learnable function when n = 50, d = 2, ϵ 2 2 = 0.01. ... For GD on the real neural network (NN), we use the step size 1/ p and the number of training epochs is fixed at 2000. In Fig. 3(a), we fix n = 50 and plot curves of test MSE of NTK overfitting solution as p increases. We let the noise ϵi in the i-th training sample be i.i.d. Gaussian with zero mean and variance σ2. The green, red, and blue curves in Fig. 3(a) corresponds to the situation σ2 = 0, σ2 = 0.04, and σ2 = 0.16, respectively. |