On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

Authors: Zhiyuan Li, Sadhika Malladi, Sanjeev Arora

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical testing showing that the trajectory under SVAG converges and closely follows SGD, suggesting (in combination with the previous result) that the SDE approximation can be a meaningful approach to understanding the implicit bias of SGD in deep learning. We train Pre Res Net32 with BN on CIFAR-10 for 300 epochs, decaying by 0.1 at epoch 250.
Researcher Affiliation Academia Zhiyuan Li Sadhika Malladi Sanjeev Arora Princeton University {zhiyuanli,smalladi,arora}@cs.princeton.edu
Pseudocode No The paper describes the SVAG algorithm using mathematical equations and descriptive text, but it does not provide a clearly labeled pseudocode block or algorithm block.
Open Source Code Yes We provide our code at https://github.com/sadhikamalladi/svag.
Open Datasets Yes SGD with batch size 125 and NGD with matching covariance have close train and test curves when training on CIFAR-10. We train Pre Res Net32 with BN on CIFAR-10 for 300 epochs, decaying by 0.1 at epoch 250.
Dataset Splits No The paper mentions 'train' and 'test' curves/accuracy but does not specify any validation dataset splits or methodology.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments (e.g., GPU models, CPU types, memory). It only discusses the experimental setup at a higher level.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes All three settings use the same LR schedule, LR= 0.8 initially and is decayed by 0.1 at epoch 250 with 300 epochs total budget. SGD with batch size 125 and NGD with matching covariance have close train and test curves when training on CIFAR-10.