Riemannian Stochastic Recursive Gradient Algorithm

Authors: Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare R-SRG(+) with R-SGD with a decaying step size sequence and R-SVRG with a fixed step size. The decaying step size sequence is αk = α(1 + αλα k/m ) 1, where k is the number of inner iterations, and where denotes the floor function. As references, we also perform comparisons with two Riemannian batch methods with backtracking line search, R-SD and R-CG, which are the steepest descent and conjugate gradient algorithms on Riemannian manifolds, respectively (Absil et al., 2008). All experiments are executed in Matlab on a 4.0 GHz Intel Core i7 PC with 32 GB RAM, and are stopped when the gradient norm passes below 10 8 or a predefined maximum iteration is reached.
Researcher Affiliation Collaboration 1The University of Electro-Communications, Japan. 2Kyoto University, Japan. 3Microsoft, India.
Pseudocode Yes Algorithm 1 R-SRG algorithm
Open Source Code Yes The codes of R-SRG are implemented in the Matlab toolbox Manopt (Boumal et al., 2014) and are available at https://github.com/hiroyuki-kasai/RSOpt.
Open Datasets Yes Here, we use the Jester dataset (Goldberg et al., 2001) consisting of 24983 user ratings of 100 jokes. Each rating is a real number between 10 and 10. We randomly extract two ratings per user as the training set Ωand test set Φ. α is chosen from {10 7, . . . , 10 2} for R-SGD, R-SVRG, and R-SRG(+), and the batch size is 1, r = 5, and ϑ = 0.1. The maximum number of outer iterations is 30 for R-SVRG and R-SRG(+), and 60 for the others. The algorithms are initialized randomly. We also use the Movie Lens-1M dataset (Mov) containing one million ratings for 3952 movies (N) from 6040 users (d). We further randomly split this set into 80/10/10 percent datasets of the entire dataset as train/validation/test partitions.
Dataset Splits Yes We further randomly split this set into 80/10/10 percent datasets of the entire dataset as train/validation/test partitions.
Hardware Specification Yes All experiments are executed in Matlab on a 4.0 GHz Intel Core i7 PC with 32 GB RAM
Software Dependencies No The paper mentions 'Matlab' and the 'Manopt' toolbox but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes All hyper parameters are selected by cross-validation. The supplementary material presents additional results. ... α is tuned from {10 5, . . . , 10 1}. m and the batch size are n and 10, respectively. ϑ = 0.05 is selected for R-SRG+.