Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Authors: Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically validate several claims made in this paper in Section 5. First, we confirm on synthetic data that neural networks do generalize better with an explicit regularizer vs. without. Second, we show that for two-layer networks, the test error decreases and margin increases as the hidden layer grows, as predicted by our theory. |
| Researcher Affiliation | Academia | Colin Wei Department of Computer Science Stanford University colinwei@stanford.edu Jason D. Lee Department of Electrical Engineering Princeton University jasonlee@princeton.edu Qiang Liu Department of Computer Science University of Texas at Austin lqiang@cs.texas.edu Tengyu Ma Department of Computer Science Stanford University tengyuma@stanford.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | No | The data comes from a ground truth network with 10 hidden networks, input dimension 20, and a ground truth unnormalized margin of at least 0.01. |
| Dataset Splits | No | The paper mentions a 'training set of size 200' but does not provide specific details on training/validation/test splits, percentages, or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We use a training set of size 200 and train for 20000 steps with learning rate 0.1, once using regularizer λ = 5 10 4 and once using regularization λ = 0. |