Riemannian Stochastic Recursive Momentum Method for non-Convex Optimization

Authors: Andi Han, Junbin Gao

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiment results demonstrate the superiority of the proposed algorithm. In this section, we compare our proposed RSRM with other one-sample online methods.
Researcher Affiliation Academia Andi Han , Junbin Gao Discipline of Business Analytics, The University of Sydney {andi.han, junbin.gao}@sydney.edu.au
Pseudocode Yes Algorithm 1 Riemannian SRM
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the proposed methodology or a link to a code repository.
Open Datasets Yes MNIST [Le Cun et al., 1998], COVTYPE from Lib SVM [Chang and Lin, 2011], YALEB [Wright et al., 2008], CIFAR100 [Krizhevsky et al., 2009], COIL100 [Nene et al., 1996], KYLBERG [Kylberg, 2014].
Dataset Splits No The paper mentions mini-batch sizes for training but does not provide specific details about validation dataset splits (e.g., percentages, counts, or a dedicated validation set methodology).
Hardware Specification Yes All algorithms are coded in Matlab and experiments are conducted on a laptop with a i5-8600 3.1GHz CPU processor.
Software Dependencies No The paper states 'All algorithms are coded in Matlab' but does not specify a version number for Matlab or any other software dependencies.
Experiment Setup Yes For competing methods, we consider a square-root decaying step size ηt = η0t 1/2, suggested in [Kasai et al., 2019]. We set the parameters of RSRM according to the theory, i.e. ηt = η0t 1/3 and ρt = ρ0t 2/3. A default value of ρ0 = 0.1 provides good empirical performance. For all methods, η0 are selected from {1, 0.5, 0.1, ..., 0.005, 0.001}. The gradient momentum parameter in c SGD-M and RAMSGRAD is set to be 0.9 and the adaptation momentum parameter in c RMSProp, RAMSGRAD and RASA is set to be 0.999. We choose a mini-batch size of 5 for RSRM and 10 for all other algorithms to ensure an identical per-iteration cost of gradient evaluation. The initial batch size for RSRM is fixed to be 100 (except for the problem of ICA where it is set to be 200).