Understanding Gradient Descent on the Edge of Stability in Deep Learning
Authors: Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The above theoretical results have been corroborated by an experimental study. |
| Researcher Affiliation | Academia | 1 Princeton University |
| Pseudocode | Yes | Algorithm 1 Perturbed Normalized Gradient Descent; Algorithm 3 Perturbed Gradient Descent on L |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We perform our experiments on a VGG-16 model (Simonyan & Zisserman, 2014) trained on CIFAR-10 dataset (Krizhevsky et al.) with Normalized GD and GD with L. |
| Dataset Splits | No | The paper mentions using a sample of training data but does not specify train/validation/test splits, percentages, or counts for any dataset. |
| Hardware Specification | No | The paper mentions training on a "single GPU" but does not specify any particular GPU model, CPU, or other hardware details. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | The network had 784 hidden units, with Ge LU activation function (Hendrycks & Gimpel, 2016). We used the loss function L as the mean squared loss to ensure the existence of minimizers and thus the manifold. For efficient training on a single GPU, we consider a sample of 1000 randomly selected points from the training data. |