Riemannian Stochastic Recursive Gradient Algorithm
Authors: Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare R-SRG(+) with R-SGD with a decaying step size sequence and R-SVRG with a fixed step size. The decaying step size sequence is αk = α(1 + αλα k/m ) 1, where k is the number of inner iterations, and where denotes the floor function. As references, we also perform comparisons with two Riemannian batch methods with backtracking line search, R-SD and R-CG, which are the steepest descent and conjugate gradient algorithms on Riemannian manifolds, respectively (Absil et al., 2008). All experiments are executed in Matlab on a 4.0 GHz Intel Core i7 PC with 32 GB RAM, and are stopped when the gradient norm passes below 10 8 or a predefined maximum iteration is reached. |
| Researcher Affiliation | Collaboration | 1The University of Electro-Communications, Japan. 2Kyoto University, Japan. 3Microsoft, India. |
| Pseudocode | Yes | Algorithm 1 R-SRG algorithm |
| Open Source Code | Yes | The codes of R-SRG are implemented in the Matlab toolbox Manopt (Boumal et al., 2014) and are available at https://github.com/hiroyuki-kasai/RSOpt. |
| Open Datasets | Yes | Here, we use the Jester dataset (Goldberg et al., 2001) consisting of 24983 user ratings of 100 jokes. Each rating is a real number between 10 and 10. We randomly extract two ratings per user as the training set Ωand test set Φ. α is chosen from {10 7, . . . , 10 2} for R-SGD, R-SVRG, and R-SRG(+), and the batch size is 1, r = 5, and ϑ = 0.1. The maximum number of outer iterations is 30 for R-SVRG and R-SRG(+), and 60 for the others. The algorithms are initialized randomly. We also use the Movie Lens-1M dataset (Mov) containing one million ratings for 3952 movies (N) from 6040 users (d). We further randomly split this set into 80/10/10 percent datasets of the entire dataset as train/validation/test partitions. |
| Dataset Splits | Yes | We further randomly split this set into 80/10/10 percent datasets of the entire dataset as train/validation/test partitions. |
| Hardware Specification | Yes | All experiments are executed in Matlab on a 4.0 GHz Intel Core i7 PC with 32 GB RAM |
| Software Dependencies | No | The paper mentions 'Matlab' and the 'Manopt' toolbox but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | All hyper parameters are selected by cross-validation. The supplementary material presents additional results. ... α is tuned from {10 5, . . . , 10 1}. m and the batch size are n and 10, respectively. ϑ = 0.05 is selected for R-SRG+. |