Riemannian adaptive stochastic gradient algorithms on matrix manifolds

Authors: Hiroyuki Kasai, Pratik Jawanpuria, Bamdev Mishra

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we compare our algorithms with the existing Riemannian adaptive gradient algorithms. In applications such as principal components analysis and matrix completion, we observe that our algorithms perform better than the baselines in most experiments both on synthetic and real-world datasets. Sections 6 provides experiments with figures illustrating performance on various datasets like MNIST, COIL100, Movie Lens, and Yale B.
Researcher Affiliation Collaboration 1Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan. 2Microsoft, India.
Pseudocode Yes Algorithm 1 Riemannian adaptive stochastic algorithm
Open Source Code Yes Our codes are available at https://github.com/hiroyuki-kasai/ RSOpt.
Open Datasets Yes We additionally evaluate RASA on the MNIST (Case P2) and COIL100 (Case P3) datasets. MNIST contains handwritten digits data of 0 9 (Le Cun et al.)... COIL100 contains normalized 7 200 color camera images of the 100 objects taken from different angles (Nene et al., 1996). We show the results on the Movie Lens datasets (Harper & Konstan, 2015)... We use the Yale B (Georghiades et al., 2001) dataset...
Dataset Splits Yes We randomly split the data into 80/20 train/test partitions. (Movie Lens datasets)
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or cloud instance types used for running the experiments.
Software Dependencies No The paper mentions 'Matlab toolbox Manopt (Boumal et al., 2014)' and 'Python libraries like Mc Torch (Meghwanshi et al., 2018) and geomstats (Miolane et al., 2018)', but does not provide specific version numbers for these software dependencies as required for reproducibility.
Experiment Setup Yes The algorithms are initialized from the same initialization point and are stopped when the iteration count reaches a predefined value. We fix the batchsize to 10 (except in the larger Movie Lens datasets, where it is set to 100). The step size sequence {αt} is generated as αt = α0/ t... We experiment with different values for the initial step size α0. The β value for adaptive algorithms (all except RSGD) is fixed to 0.99. The momentum-related β term (used only in Radam and Ramsgrad) is set to 0.9.