Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Momentum-Based Variance Reduction in Non-Convex SGD
Authors: Ashok Cutkosky, Francesco Orabona
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present some empirical results in Section 6 and concludes with a discussion in Section 7. |
| Researcher Affiliation | Collaboration | Ashok Cutkosky Google Research Mountain View, CA, USA EMAIL Francesco Orabona Boston University Boston, MA, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 STORM: STOchastic Recursive Momentum |
| Open Source Code | Yes | 1https://github.com/google-research/google-research/tree/master/storm_optimizer |
| Open Datasets | Yes | We implemented STORM in Tensor Flow [1] and tested its performance on the CIFAR-10 image recognition benchmark [14] using a Res Net model [10], as implemented by the Tensor2Tensor package [26]1. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and MNIST datasets but does not explicitly state the training, validation, and test splits used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | We implemented STORM in Tensor Flow [1]... The paper mentions TensorFlow but does not provide specific version numbers for TensorFlow or any other software libraries. |
| Experiment Setup | Yes | The learning rates for Ada Grad and Adam were swept over a logarithmically spaced grid. For STORM, we set w = k = 0.1 as a default2 and swept c over a logarithmically spaced grid, so that all algorithms involved only one parameter to tune. No regularization was employed. |