reproducibilityindex.ai

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Authors: Alex Damian, Eshaan Nichani, Jason D. Lee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify that the predicted dynamics defined in eq. (5) accurately capture the dynamics of gradient descent at the edge of stability by replicating the experiments in (Cohen et al., 2021) and tracking the deviation of gradient descent from the constrained trajectory. In Figure 3, we evaluate our theory on a 3-layer MLP and a 3-layer CNN trained with mean squared error (MSE) on a 5k subset of CIFAR10 and a 2-layer Transformer (Vaswani et al., 2017) trained with MSE on SST2 Socher et al. (2013).
Researcher Affiliation	Academia	Alex Damian, Eshaan Nichani & Jason D. Lee Princeton University {ad27,eshnich,jasonlee}@princeton.edu
Pseudocode	Yes	Definition 6 (Predicted Dynamics, full). Define v0 = v0, and let xt = vt ut, yt = S vt. Then v t+1 = P ut+1(I η 2Lt)P utv t + ηP ut+1 S t (1 + ηy t )x t ut+1 (6)" and "Definition 7. Given a vector v and a timestep t, define stept(v) by P ut+1 stept(v) = P ut+1 (I η 2Lt)P utv + η S t ut+1 stept(v) = (1 + ηy)x. (8)
Open Source Code	Yes	Our code can be found at https://github.com/adamian98/EOS.
Open Datasets	Yes	We evaluate our theory on a 3-layer MLP and a 3-layer CNN trained with mean squared error (MSE) on a 5k subset of CIFAR10 and a 2-layer Transformer (Vaswani et al., 2017) trained with MSE on SST2 Socher et al. (2013).
Dataset Splits	No	We evaluate our theory on a 3-layer MLP and a 3-layer CNN trained with mean squared error (MSE) on a 5k subset of CIFAR10 and a 2-layer Transformer (Vaswani et al., 2017) trained with MSE on SST2 Socher et al. (2013). (This mentions the training data size but not the splits.) No explicit information about dataset splits (e.g., percentages or counts for train/validation/test) is provided.
Hardware Specification	No	All experiments were conducted on two servers, each with 10 NVIDIA GPUs. (This states the type and count of GPUs, but not specific models or other hardware details like CPU or memory.)
Software Dependencies	No	Our experiments were conducted in JAX (Bradbury et al., 2018), using https://github.com/ locuslab/edge-of-stability as a reference for replicating the experimental setup used in (Cohen et al., 2021). (JAX is mentioned but no version number.)
Experiment Setup	Yes	For every experiment, we tracked the gradient descent dynamics until they reached instability and then began tracking the constrained trajectory, gradient descent, gradient flow, and both our predicted dynamics (Section 5) and our generalized predicted dynamics (Appendix F)... we switched to computing gradients with 64-bit precision after first reaching instability to avoid propagating floating point errors." and "MLP+MSE on CIFAR10, η = 0.002" (from Figure 3 caption).