Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate

Authors: Miao Lu, Beining Wu, Xiaodong Yang, Difan Zou

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate our findings on the phenomenon of benign oscillation .
Researcher Affiliation Academia Miao Lu*1, Beining Wu*2, Xiaodong Yang3, Difan Zou4 1Stanford University, 2University of Chicago, 3Harverd University, 4University of Hong Kong
Pseudocode No The paper describes procedures using mathematical equations and text but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Training and test performance of Res Net-18 on CIFAR-10 dataset
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, specific percentages, or sample counts, nor does it refer to predefined splits with citations for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory) used for running its experiments.
Software Dependencies No The paper mentions software components like 'Res Net-18' and 'SGD' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We adopt the same configuration as in Andriushchenko et al. (2023): using weight decay but no momentum and no data augmentation. A clear difference between the large learning rate training and small learning rate training can be observed: SGD with a large learning rate leads to an oscillating training curve with higher testing accuracy; SGD with a small learning rate has a rapid and smooth convergence but gives lower testing accuracy.