Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?

Authors: Samet Oymak, Mahdi Soltanolkotabi

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify our theoretical claims, we conducted experiments on MNIST classification and low-rank matrix regression. To illustrate the tradeoffs between the loss function and the distance to the initial point, we define normalized misfit and normalized distance as follows. (Section 5, Numerical Experiments)
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, University of California, Riverside 2Department of Electrical and Computer Engineering, University of Southern California.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We consider MNIST digit classification task and use a standard Le Net model (Le Cun et al., 1998) from Tensorflow (Abadi et al., 2016).
Dataset Splits No The paper mentions 'training' and 'test errors' in the context of MNIST experiments, but does not provide specific percentages or sample counts for training, validation, or test dataset splits. For synthetic low-rank regression, it only mentions varying sample size 'n'.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using 'TensorFlow' and 'Adam' but does not specify their version numbers or any other software dependencies with specific versions.
Experiment Setup Yes Both experiments use Adam with learning rate 0.001 and batch size 100 for 1000 iterations. At each iteration, we record the normalized misfit and distance to obtain a misfit-distance trajectory similar to Figure 1.