On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Authors: Peizhong Ju, Xiaojun Lin, Ness Shroff

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1. The test mean-square-error(MSE) vs. the number of features/neurons p for (a) learnable function and (b) not-learnable function when n = 50, d = 2, ϵ 2 2 = 0.01. The corresponding ground-truth are (a) f(θ) = P k {0,1,2,4}(sin(kθ) + cos(kθ)), and (b) f(θ) = P k {3,5,7,9}(sin(kθ) + cos(kθ)). (Note that in 2-dimension every input x on a unit circle can be represented by an angle θ [ π, π]. See the end of Section 4.) Every curve is the average of 9 random simulation runs. Figure 3. The test MSE of the overfitted NTK model for the same ground-truth function as Fig. 1(a). (a) We fix n = 50 and increase p for different noise level σ2. (b) We fix p = 20000 and increase n. All data points in this figure are the average of five random simulation runs.
Researcher Affiliation Academia 1School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana, USA. 2Department of ECE and CSE, The Ohio State University, Columbus, Ohio, USA.
Pseudocode No The paper describes mathematical concepts and derivations but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the methodology described.
Open Datasets No The paper describes the generation of synthetic data for experiments ('the inputs x are i.i.d. uniformly distributed in Sd 1, and the initial weights V0[j] s are i.i.d. uniformly distributed in all directions in Rd;') but does not provide concrete access information (link, DOI, specific citation) for a publicly available or open dataset.
Dataset Splits No The paper states 'n training samples' and evaluates 'test error' or 'test MSE' but does not specify explicit training/validation/test dataset splits (e.g., percentages, sample counts, or methodology for splitting).
Hardware Specification No The paper does not provide any specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup Yes Figure 1. The test mean-square-error(MSE) vs. the number of features/neurons p for (a) learnable function and (b) not-learnable function when n = 50, d = 2, ϵ 2 2 = 0.01. ... For GD on the real neural network (NN), we use the step size 1/ p and the number of training epochs is fixed at 2000. In Fig. 3(a), we fix n = 50 and plot curves of test MSE of NTK overfitting solution as p increases. We let the noise ϵi in the i-th training sample be i.i.d. Gaussian with zero mean and variance σ2. The green, red, and blue curves in Fig. 3(a) corresponds to the situation σ2 = 0, σ2 = 0.04, and σ2 = 0.16, respectively.