Variants of RMSProp and Adagrad with Logarithmic Regret Bounds
Authors: Mahesh Chandra Mukkamala, Matthias Hein
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate in the experiments that these new variants outperform other adaptive gradient techniques or stochastic gradient descent in the optimization of strongly convex functions as well as in training of deep neural networks. |
| Researcher Affiliation | Academia | 1Department of Mathematics and Computer Science, Saarland University, Germany 2IMPRS-CS, Max Planck Institute for Informatics, Saarbrücken, Germany . |
| Pseudocode | Yes | Algorithm 1 Adagrad; Algorithm 2 SC-Adagrad; Algorithm 3 RMSProp; Algorithm 4 SC-RMSProp |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Datasets: We use three datasets where it is easy, difficult and very difficult to achieve good test performance, just in order to see if this influences the performance. For this purpose we use MNIST (60000 training samples, 10 classes), CIFAR10 (50000 training samples, 10 classes) and CIFAR100 (50000 training samples, 100 classes). We refer to (Krizhevsky, 2009) for more details on the CIFAR datasets. |
| Dataset Splits | No | The paper mentions training sample counts for MNIST (60000), CIFAR10 (50000), and CIFAR100 (50000) and refers to Krizhevsky (2009) for CIFAR details, which implies standard train/test splits. However, it does not explicitly describe a validation set or its size, or detailed splitting methodology beyond the training counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | Yes | Note that all methods have only one varying parameter: the stepsize α which we choose from the set of {1, 0.1, 0.01, 0.001, 0.0001} for all experiments. ... The decaying damping factor for both SC-Adagrad and SC-RMSProp is used with ξ1 = 0.1, ξ2 = 1 for convex problems and we use ξ1 = 0.1, ξ2 = 0.1 for non-convex deep learning problems. Finally, the numerical stability parameter δ used in Adagrad, Adam, RMSProp is set to 10-8 as it is typically recommended for these algorithms. |