Limitations of Lazy Training of Two-layers Neural Network
Authors: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results are summarized by Figure 1, which compares the risk achieved by the three approaches above in the population limit n , using quadratic activations σ(u) = u2 + c0. Figure 1 reports the risk achieved by various approaches in numerical simulations, and compares them with our theoretical predictions for each of three regimes RF, NT, and NN, which are detailed in the next sections. For the experiments illustrated in Figures 1 and 2, we use feature size of d = 450, and number of hidden units N {45, , 4500}. NT and NN models are trained with SGD in Tensor Flow [1]. We run a total of 2 105 SGD steps for each (qf) model and 1.4 105 steps for each (mg) model. |
| Researcher Affiliation | Academia | Behrooz Ghorbani Department of Electrical Engineering Stanford University ghorbani@stanford.edu Song Mei ICME Stanford University songmei@stanford.edu Theodor Misiakiewicz Department of Statistics Stanford University misiakie@stanford.edu Andrea Montanari Department of Electrical Engineering and Department of Statistics Stanford University montanar@stanford.edu |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link or explicit statement of code release) for the source code of the methodology described. |
| Open Datasets | No | The paper describes synthetic data models (e.g., 'Feature vectors xi are d-dimensional Gaussians', 'mixture of two d-dimensional centered Gaussians') but does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions evaluating 'test error' on 'fresh samples' and discusses the 'population limit n', but it does not specify explicit training, validation, and test dataset splits (e.g., percentages, sample counts for a validation set, or citations to predefined splits) that would be needed for reproducibility of data partitioning, especially regarding a separate validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'SGD in Tensor Flow [1]' but does not provide a specific version number for TensorFlow or any other software dependency. |
| Experiment Setup | Yes | We run a total of 2 105 SGD steps for each (qf) model and 1.4 105 steps for each (mg) model. The SGD batch size is fixed at 100 and the step size is chosen from the grid {0.001, , 0.03} where the hyper-parameter that achieves the best fit is used for the figures. |