Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate
Authors: Miao Lu, Beining Wu, Xiaodong Yang, Difan Zou
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate our findings on the phenomenon of benign oscillation . |
| Researcher Affiliation | Academia | Miao Lu*1, Beining Wu*2, Xiaodong Yang3, Difan Zou4 1Stanford University, 2University of Chicago, 3Harverd University, 4University of Hong Kong |
| Pseudocode | No | The paper describes procedures using mathematical equations and text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Training and test performance of Res Net-18 on CIFAR-10 dataset |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, specific percentages, or sample counts, nor does it refer to predefined splits with citations for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Res Net-18' and 'SGD' but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We adopt the same configuration as in Andriushchenko et al. (2023): using weight decay but no momentum and no data augmentation. A clear difference between the large learning rate training and small learning rate training can be observed: SGD with a large learning rate leads to an oscillating training curve with higher testing accuracy; SGD with a small learning rate has a rapid and smooth convergence but gives lower testing accuracy. |