Riemannian adaptive stochastic gradient algorithms on matrix manifolds
Authors: Hiroyuki Kasai, Pratik Jawanpuria, Bamdev Mishra
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we compare our algorithms with the existing Riemannian adaptive gradient algorithms. In applications such as principal components analysis and matrix completion, we observe that our algorithms perform better than the baselines in most experiments both on synthetic and real-world datasets. Sections 6 provides experiments with figures illustrating performance on various datasets like MNIST, COIL100, Movie Lens, and Yale B. |
| Researcher Affiliation | Collaboration | 1Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan. 2Microsoft, India. |
| Pseudocode | Yes | Algorithm 1 Riemannian adaptive stochastic algorithm |
| Open Source Code | Yes | Our codes are available at https://github.com/hiroyuki-kasai/ RSOpt. |
| Open Datasets | Yes | We additionally evaluate RASA on the MNIST (Case P2) and COIL100 (Case P3) datasets. MNIST contains handwritten digits data of 0 9 (Le Cun et al.)... COIL100 contains normalized 7 200 color camera images of the 100 objects taken from different angles (Nene et al., 1996). We show the results on the Movie Lens datasets (Harper & Konstan, 2015)... We use the Yale B (Georghiades et al., 2001) dataset... |
| Dataset Splits | Yes | We randomly split the data into 80/20 train/test partitions. (Movie Lens datasets) |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Matlab toolbox Manopt (Boumal et al., 2014)' and 'Python libraries like Mc Torch (Meghwanshi et al., 2018) and geomstats (Miolane et al., 2018)', but does not provide specific version numbers for these software dependencies as required for reproducibility. |
| Experiment Setup | Yes | The algorithms are initialized from the same initialization point and are stopped when the iteration count reaches a predefined value. We fix the batchsize to 10 (except in the larger Movie Lens datasets, where it is set to 100). The step size sequence {αt} is generated as αt = α0/ t... We experiment with different values for the initial step size α0. The β value for adaptive algorithms (all except RSGD) is fixed to 0.99. The momentum-related β term (used only in Radam and Ramsgrad) is set to 0.9. |