Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Authors: Yuanzhi Li, Yingyu Liang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset. and 7 Experiments This section aims at verifying some key implications: (1) the activation patterns of the hidden units couple with those at initialization; (2) The distance from the learned solution from the initialization is relatively small compared to the size of initialization; (3) The accumulated updates (i.e., the difference between the learned weight matrix and the initialization) have approximately low rank. These are indeed supported by the results on the synthetic data and on the MNIST data. |
| Researcher Affiliation | Academia | Yuanzhi Li Computer Science Department Stanford University Stanford, CA 94305 yuanzhil@stanford.edu and Yingyu Liang Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706 yliang@cs.wisc.edu |
| Pseudocode | No | The paper describes the SGD update as a mathematical formula (1) but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the methodology described. |
| Open Datasets | Yes | on the benchmark dataset MNIST |
| Dataset Splits | No | The paper mentions '1000 training data points and 1000 test data points are sampled' for synthetic data but does not provide explicit training/validation/test dataset splits with percentages or specific counts for all datasets used, nor does it specify a validation set. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, required to replicate the experiment. |
| Experiment Setup | Yes | On the synthetic data, the SGD is run for T = 400 steps with batch size B = 16 and learning rate η = 10/m. On MNIST, the SGD is run for T = 2 104 steps with batch size B = 64 and learning rate η = 4 103/m. and the weights are initialized with N(0, 1/ m). |