Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Accelerating SGD with momentum for over-parameterized learning
Authors: Chaoyue Liu, Mikhail Belkin
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation of Ma SS for several standard architectures of deep networks, including Res Net and convolutional networks, shows improved performance over SGD, SGD+Nesterov and Adam. |
| Researcher Affiliation | Academia | Chaoyue Liu Department of Computer Science The Ohio State University Columbus, OH 43210 EMAIL Mikhail Belkin Department of Computer Science The Ohio State University Columbus, OH 43210 EMAIL |
| Pseudocode | Yes | A PSEUDOCODE FOR MASS Algorithm 1 : Ma SS Momentum-added Stochastic Solver |
| Open Source Code | Yes | Code url: https://github.com/ts66395/Ma SS |
| Open Datasets | Yes | Real data: MNIST and CIFAR-10. We compare the optimization performance of SGD, SGD+Nesterov and Ma SS on the following tasks: classification of MNIST with a fullyconnected network (FCN), classification of CIFAR-10 with a convolutional neural network (CNN) and Gaussian kernel regression on MNIST. |
| Dataset Splits | No | The paper mentions training and testing but does not specify details about a validation dataset split or how it was used. |
| Hardware Specification | No | GPUs donated by Nvidia were used for the experiments. (This is too general, lacking specific models or types.) |
| Software Dependencies | No | The paper does not specify versions for any software components used in the experiments (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | All algorithms are implemented with mini batches of size 64 for neural network training. In each task, we use the same initial learning rate for Ma SS, SGD and SGD+Nesterov, and run the same number of epochs (150 epochs for CNN and 300 epochs for Res Net-32). CNN: η = 0.01 (initial), α = 0.05, κm = 3; η = 0.3 (initial), α = 0.05, κm = 6. Res Net-32: η = 0.1 (initial), α = 0.05, κm = 2; η = 0.3 (initial), α = 0.05, κm = 24. |