Momentum-Based Variance Reduction in Non-Convex SGD

Authors: Ashok Cutkosky, Francesco Orabona

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present some empirical results in Section 6 and concludes with a discussion in Section 7.
Researcher Affiliation Collaboration Ashok Cutkosky Google Research Mountain View, CA, USA ashok@cutkosky.com Francesco Orabona Boston University Boston, MA, USA francesco@orabona.com
Pseudocode Yes Algorithm 1 STORM: STOchastic Recursive Momentum
Open Source Code Yes 1https://github.com/google-research/google-research/tree/master/storm_optimizer
Open Datasets Yes We implemented STORM in Tensor Flow [1] and tested its performance on the CIFAR-10 image recognition benchmark [14] using a Res Net model [10], as implemented by the Tensor2Tensor package [26]1.
Dataset Splits No The paper mentions using CIFAR-10 and MNIST datasets but does not explicitly state the training, validation, and test splits used.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No We implemented STORM in Tensor Flow [1]... The paper mentions TensorFlow but does not provide specific version numbers for TensorFlow or any other software libraries.
Experiment Setup Yes The learning rates for Ada Grad and Adam were swept over a logarithmically spaced grid. For STORM, we set w = k = 0.1 as a default2 and swept c over a logarithmically spaced grid, so that all algorithms involved only one parameter to tune. No regularization was employed.