Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An Adaptive Algorithm for Bilevel Optimization on Riemannian Manifolds

Authors: Xu Shi, Rufeng Xiao, Rujun Jiang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that Ada RHD achieves comparable performance to existing non-adaptive approaches while exhibiting greater robustness.
Researcher Affiliation	Academia	1School of Data Science, Fudan University 2Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University
Pseudocode	Yes	Algorithm 1 Adaptive Riemannian Hypergradient Descent (Ada RHD)... Algorithm 2 Tangent Space Conjugate Gradient... Algorithm 3 Ada RHD with Retraction (Ada RHD-R)... Algorithm 4 Ada RHD for Riemannian Min-Max Optimization
Open Source Code	Yes	Detailed experimental settings and additional experiments are provided in Appendix I, and our codes are available at https://github.com/Rufeng Xiao/Ada RHD.
Open Datasets	Yes	utilize a 3-layer SPD network [40] as the upperlevel architecture to optimize input embeddings over the larger AFEW dataset [20], comprising seven emotion classes. ... [20] Abhinav Dhall, Roland Goecke, Jyoti Joshi, Karan Sikka, and Tom Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the International Conference on Multimodal Interaction, pages 461 466, 2014.
Dataset Splits	Yes	To address computational constraints, we utilize a 5% subset of the AFEW dataset [20] rather than the full dataset, ensuring tractable training durations. ... In each trial, a randomly sampled validation set is reserved for the upper-level problem, with an equally sized training partition allocated to the lower-level task.
Hardware Specification	Yes	All implementations are executed using the Geoopt framework [48]... Furthermore, all the experiments are implemented based on Geoopt [48] and are implemented using Python 3.8 on a Linux server with 256GB RAM and 96-core AMD EPYC 7402 2.8GHz CPU.
Software Dependencies	Yes	All implementations are executed using the Geoopt framework [48]... Furthermore, all the experiments are implemented based on Geoopt [48] and are implemented using Python 3.8...
Experiment Setup	Yes	For n = 100, the maximum number of outer iterations in RHGD [33] is set to 200, whereas for n = 1000, this number is increased to 400. In Algorithm 3, the number of outer iterations is set to T = 1000 for Ada RHD-GD and T = 10000 for Ada RHD-CG. Following [33], we set λ = 0.01 and fix the step sizes in RHGD as ηx = ηy = 0.5. To ensure consistency, the initial step sizes in Algorithm 3 are set to a0 = b0 = c0 = 2.