Momentum-Based Variance Reduction in Non-Convex SGD
Authors: Ashok Cutkosky, Francesco Orabona
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present some empirical results in Section 6 and concludes with a discussion in Section 7. |
| Researcher Affiliation | Collaboration | Ashok Cutkosky Google Research Mountain View, CA, USA ashok@cutkosky.com Francesco Orabona Boston University Boston, MA, USA francesco@orabona.com |
| Pseudocode | Yes | Algorithm 1 STORM: STOchastic Recursive Momentum |
| Open Source Code | Yes | 1https://github.com/google-research/google-research/tree/master/storm_optimizer |
| Open Datasets | Yes | We implemented STORM in Tensor Flow [1] and tested its performance on the CIFAR-10 image recognition benchmark [14] using a Res Net model [10], as implemented by the Tensor2Tensor package [26]1. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and MNIST datasets but does not explicitly state the training, validation, and test splits used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | We implemented STORM in Tensor Flow [1]... The paper mentions TensorFlow but does not provide specific version numbers for TensorFlow or any other software libraries. |
| Experiment Setup | Yes | The learning rates for Ada Grad and Adam were swept over a logarithmically spaced grid. For STORM, we set w = k = 0.1 as a default2 and swept c over a logarithmically spaced grid, so that all algorithms involved only one parameter to tune. No regularization was employed. |