Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate
Authors: Miao Lu, Beining Wu, Xiaodong Yang, Difan Zou
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate our findings on the phenomenon of benign oscillation . |
| Researcher Affiliation | Academia | Miao Lu*1, Beining Wu*2, Xiaodong Yang3, Difan Zou4 1Stanford University, 2University of Chicago, 3Harverd University, 4University of Hong Kong |
| Pseudocode | No | The paper describes procedures using mathematical equations and text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Training and test performance of Res Net-18 on CIFAR-10 dataset |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, specific percentages, or sample counts, nor does it refer to predefined splits with citations for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Res Net-18' and 'SGD' but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We adopt the same configuration as in Andriushchenko et al. (2023): using weight decay but no momentum and no data augmentation. A clear difference between the large learning rate training and small learning rate training can be observed: SGD with a large learning rate leads to an oscillating training curve with higher testing accuracy; SGD with a small learning rate has a rapid and smooth convergence but gives lower testing accuracy. |