Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?
Authors: Samet Oymak, Mahdi Soltanolkotabi
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify our theoretical claims, we conducted experiments on MNIST classification and low-rank matrix regression. To illustrate the tradeoffs between the loss function and the distance to the initial point, we define normalized misfit and normalized distance as follows. (Section 5, Numerical Experiments) |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, University of California, Riverside 2Department of Electrical and Computer Engineering, University of Southern California. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We consider MNIST digit classification task and use a standard Le Net model (Le Cun et al., 1998) from Tensorflow (Abadi et al., 2016). |
| Dataset Splits | No | The paper mentions 'training' and 'test errors' in the context of MNIST experiments, but does not provide specific percentages or sample counts for training, validation, or test dataset splits. For synthetic low-rank regression, it only mentions varying sample size 'n'. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'TensorFlow' and 'Adam' but does not specify their version numbers or any other software dependencies with specific versions. |
| Experiment Setup | Yes | Both experiments use Adam with learning rate 0.001 and batch size 100 for 1000 iterations. At each iteration, we record the normalized misfit and distance to obtain a misfit-distance trajectory similar to Figure 1. |