Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

Authors: Mahesh Chandra Mukkamala, Matthias Hein

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate in the experiments that these new variants outperform other adaptive gradient techniques or stochastic gradient descent in the optimization of strongly convex functions as well as in training of deep neural networks.
Researcher Affiliation Academia 1Department of Mathematics and Computer Science, Saarland University, Germany 2IMPRS-CS, Max Planck Institute for Informatics, Saarbrücken, Germany .
Pseudocode Yes Algorithm 1 Adagrad; Algorithm 2 SC-Adagrad; Algorithm 3 RMSProp; Algorithm 4 SC-RMSProp
Open Source Code No The paper does not provide any specific links or explicit statements about the availability of open-source code for the described methodology.
Open Datasets Yes Datasets: We use three datasets where it is easy, difficult and very difficult to achieve good test performance, just in order to see if this influences the performance. For this purpose we use MNIST (60000 training samples, 10 classes), CIFAR10 (50000 training samples, 10 classes) and CIFAR100 (50000 training samples, 100 classes). We refer to (Krizhevsky, 2009) for more details on the CIFAR datasets.
Dataset Splits No The paper mentions training sample counts for MNIST (60000), CIFAR10 (50000), and CIFAR100 (50000) and refers to Krizhevsky (2009) for CIFAR details, which implies standard train/test splits. However, it does not explicitly describe a validation set or its size, or detailed splitting methodology beyond the training counts.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers.
Experiment Setup Yes Note that all methods have only one varying parameter: the stepsize α which we choose from the set of {1, 0.1, 0.01, 0.001, 0.0001} for all experiments. ... The decaying damping factor for both SC-Adagrad and SC-RMSProp is used with ξ1 = 0.1, ξ2 = 1 for convex problems and we use ξ1 = 0.1, ξ2 = 0.1 for non-convex deep learning problems. Finally, the numerical stability parameter δ used in Adagrad, Adam, RMSProp is set to 10-8 as it is typically recommended for these algorithms.