Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization
Authors: Xiaochuan Gong, Jie Hao, Mingrui Liu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on synthetic and deep learning tasks demonstrate the effectiveness of our proposed algorithms. (Abstract) ... In this section, we empirically evaluate our proposed algorithms on three tasks, including synthetic test functions (Section 6.1), deep AUC maximization (Section 6.2), and hyperparameter optimization (Appendix K). |
| Researcher Affiliation | Academia | Xiaochuan Gong George Mason University EMAIL Jie Hao George Mason University EMAIL Mingrui Liu George Mason University EMAIL |
| Pseudocode | Yes | Algorithm 1 Adaptive Algorithm for Minimax Optimization (Ada-Minimax) ... Algorithm 2 Adaptive Algorithm for Bilevel Optimization (Ada-Bi O) |
| Open Source Code | Yes | The code is available at https://github.com/Mingrui Liu-ML-Lab/ adaptive-hierarchical-optimization. |
| Open Datasets | Yes | We first construct the imbalanced binary classification dataset Sentiment140 [23] (under Creative Commons Attribution 4.0 License). ... We consider hyperparameter optimization on the TREC text classification dataset [49], provided under the Creative Commons Attribution 4.0 License. |
| Dataset Splits | No | In our experiments, we employ a BERT model with 4 self-attention layers, each comprising 4 attention heads, followed by a fully-connected layer with an output dimension of 6, corresponding to the six classification categories. The model is trained from scratch for 50 epochs. We compare our algorithm s training and test performance against the tuning-free bilevel optimization (TFBO) method proposed by [73]. For TFBO, we conduct a grid search to select optimal initial values for the upper-level learning rate α0, lower-level learning rate β0, and linear system learning rate φ within the range [1.0 10 5, 10.0], and set them to {0.01, 0.1, 0.1}. For Ada-Bi O, we similarly perform hyperparameter tuning over the parameters (ηx, ηy, α, γ) within the range [1.0 10 5, 1.0], selecting the optimal values (1.0 10 5, 0.5, 1.0, 0.1) for evaluation. |
| Hardware Specification | Yes | The Computations were run on Hopper, a research computing cluster provided by the Office of Research Computing at George Mason University (URL: https://orc.gmu.edu). |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies with version numbers. |
| Experiment Setup | Yes | For synthetic experiments, we tune hyperparameters for each baseline using a grid search and report their best results. Both the parameter α used in the momentum parameter estimate (3) and the base learning rates (ηx, ηy) are tuned within the set {0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 5.0}. We use the following parameter choices for various noise magnitude: for σ = 0, α = 2.0, ηx = 3.0, ηy = 3.0 for Ada-Minimax , and ηx = 4.0, ηy = 4.0 for Ti Ada; for σ = 20, α = 2.0, ηx = 1.5, ηy = 1.5 for Ada-Minimax , and ηx = 2.0, ηy = 2.0 for Ti Ada; for σ = 50, α = 3.0, ηx = 2.0, ηy = 2.0 for Ada-Minimax , and ηx = 2.0, ηy = 2.0 for Ti Ada; for σ = 100, α = 5.0, ηx = 3.0, ηy = 3.0 for Ada-Minimax , and ηx = 2.5, ηy = 2.5 for Ti Ada. Other hyperparameters in Ti Ada are set to the default choices as suggested in [47]. (Appendix I) |