On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model
Authors: Peizhong Ju, Xiaojun Lin, Ness Shroff
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use numerical results to illustrate the different roles of p1 and p2 in reducing the generalization error. We fix p2 = 200 and plot the MSE with respect to p1 in Fig. 2(b). Although the test error decreases when p1 increases, the decreasing speed is slow, especially for the noisy situation. Such a slow decreasing speed with p1 remains even when p2 is fixed to a much higher value. For example, in Fig. 2(b), we fix p2 = , we still observe the similarly slow decreasing speed with p1 as shown by Fig. 2(c). In contrast, the descent with respect to p2 should be easier to observe and can reach a lower test MSE. In Fig. 2(d), we fix p1 = 200 and increase p2 (i.e., we exchange the values of p1 and p2 in Fig. 2(c)(d)). As we can see, all three curves in Fig. 2(d) have a more obvious descent and decrease to lower MSE compared with those in Fig. 2(c), which validates our conjecture that the descent speed with respect to the number of neurons of the second hidden-layer is faster. |
| Researcher Affiliation | Academia | Peizhong Ju Department of ECE The Ohio State University Columbus, OH 43210 ju.171@osu.edu Xiaojun Lin School of ECE Purdue University West Lafayette, IN 47906 linx@purdue.edu Ness B. Shroff Department of ECE The Ohio State University Columbus, OH 43210 shroff.11@osu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] As this is mainly a theoretical paper. |
| Open Datasets | No | The paper states 'Let (Xi, f(Xi) + ǫi), i = 1, 2, , n denote n pieces of training data' and mentions 'ground-truth function is f(x) = x T e1 2 + x T e1 3 where d = 3', indicating synthetic data generated from a function, not a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper does not provide specific dataset split information for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | We fix p2 = 200 and plot the MSE with respect to p1 in Fig. 2(b). ...We fix p1 = 200 and increase p2... We fixed n = 200 and let p1 = p2 increase simultaneously. The ground-truth model in this figure is f(x) = (x T e)2 + (x T e)3 where d = 3. The green, orange, and blue curves denote the situations of σ2 = 0 (no noise), σ2 = 0.01, and σ2 = 0.04, respectively. Every point in this figure is the median of 20 simulation runs. |