Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds

Authors: Daniel Dodd, Louis Sharrock, Christopher Nemeth

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach is validated through numerical experiments, demonstrating competitive performance against learning-rate-dependent algorithms. We assess the numerical performance of RDo G (Algorithm 1), RDo WG (Algorithm 2), and NRDo G against manually tuned RSGD (Bonnabel, 2013) and RADAM (Becigneul & Ganea, 2019).
Researcher Affiliation Academia Daniel Dodd 1 Louis Sharrock 1 Christopher Nemeth 1 1Department of Mathematics and Statistics, Lancaster University, UK.
Pseudocode Yes Algorithm 1 RDo G; Algorithm 2 RDo WG; Algorithm 3 T-RDo WG; Algorithm 4 in Appendix F.
Open Source Code Yes Code to reproduce the experiments is available at https://github.com/daniel-dodd/riemannian_dog.
Open Datasets Yes We consider datasets Wine, Waveform-5000, and Tiny Image Net. The Word Net noun hierarchy (Miller et al., 1990) is a lexical database of English words organized into a hierarchical structure.
Dataset Splits No The paper mentions 'Each dataset has an 80:20 train-test split per replication.' but does not specify a separate validation split.
Hardware Specification Yes Implementing all algorithms in Python 3 with JAX (Bradbury et al., 2018), our experiments run on a Mac Book Pro 16 (2021) with an Apple M1 Pro chip and 16GB of RAM.
Software Dependencies No The paper states 'Implementing all algorithms in Python 3 with JAX'. While 'Python 3' is a version, JAX is not given with a specific version number. Other software like scikit-learn is mentioned without a version.
Experiment Setup Yes We employ RADAM and RSGD with a grid of twenty logarithmically spaced learning rates η [10 8, 106]. On the other hand, we investigate RDo G and RDo WG with ten logarithmically spaced initial distance values ϵ [10 8, 100]. In training, Wine uses the full batch for T = 5000 iterations, and Waveform-5000 and Tiny Image Net use batch sizes of 64 for T = 2000 iterations. For initialization, following Nickel & Kiela (2017), we uniformly initialize the embeddings in [ 10 3, 10 3]d and consider ten logarithmically spaced learning rates η [10 2, 102] and five logarithmically spaced initial distance estimates ϵ [10 10, 10 6]. In the first ten epochs, we use RSGD with a reduced learning rate of η/10 for RSGD and RADAM. Thereafter, we run the optimizers on the initialized embeddings for one thousand epochs, with each iteration having a batch size of ten and fifty uniformly sampled negative samples. We repeat this experiment over five replications.