Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Authors: Lesi Chen, Yaohua Ma, Jingzhao Zhang

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to showcase the superiority of our proposed methods. We implement the algorithms using Py Torch (Paszke et al., 2019). In all the experiments, we tune tune the step size in the grid {10^-5, 10^-4, 10^-3, 10^-2, 10^-1, 10^0, 10^1, 10^2, 10^3} and present the best result of each algorithm. Figure 1: Numerical veriﬁcation of F2BA on Problem (10). Figure 2: Comparison of deterministic algorithms on Problem (11). Our proposed algorithms F2BA as well as its acceleration Acc F2BA outperform baselines. Figure 3: Comparison of stochastic algorithms on Problem (12) under diﬀerent corruption ratios p. Due to the use of two-time-scale step size, our proposed F2BSA signiﬁcantly outperforms single-time-scale baselines PBGD and F2SA.
Researcher Affiliation	Academia	Lesi Chen* EMAIL IIIS, Tsinghua University Shanghai Qi Zhi Institute Beijing, China Yaohua Ma* EMAIL IIIS, Tsinghua University Beijing, China Jingzhao Zhang EMAIL IIIS, Tsinghua University Shanghai AI Lab Shanghai Qi Zhi Institute Beijing, China
Pseudocode	Yes	Algorithm 1 F2BA (x0, y0) Algorithm 2 F2BSA (x0, y0) Algorithm 3 Perturbed F2BA (x0, y0) Algorithm 4 Accelerated F2BA(x0, y0) Algorithm 5 F3BSA(x, y0) Algorithm 6 MLMC-Hyper Grad Est(x, y0, z0, φf, φg, M, N, S) Algorithm 7 Hyper Grad Est(x, y0, z0, φf, φg, N, K) Algorithm 8 SGD (h, x0, η, K) Algorithm 9 SGDSC (h, x0, N, K) Algorithm 10 Inexact Perturbed Gradient Descent Algorithm 11 Inexact Nonconvex Accelerated Gradient Descent Algorithm 12 Distributed F2BA ( x0, y0, ηx, ηy, ηz, λ, T, K)
Open Source Code	No	The paper does not provide an explicit statement or a link to a code repository for their methods. It mentions that they implement the algorithms using Py Torch, but this refers to the use of a third-party library, not the release of their own source code.
Open Datasets	Yes	We use the abalone dataset1, which contains 4,177 samples and each sample has 8 features (p = 8). 1. Dataset available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ ... We conduct the experiments on the 20 newsgroup dataset which is commonly used by previous works (Grazzi et al., 2020; Ji et al., 2021). ... In our experiment, we use the Alpaca dataset (Taori et al., 2023), which contains 52k instructions and demonstration outputs generated by Open AI s text-davinci-003 engine. ... Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
Dataset Splits	Yes	We split the dataset into the training set and the validation set in a 7:3 ratio. ... We split the dataset into a training set and a validation set in an 8 : 2 ratio
Hardware Specification	Yes	We run the experiments on 8 A40 GPUs.
Software Dependencies	No	The paper states: 'We implement the algorithms using Py Torch (Paszke et al., 2019).' However, it does not specify a version number for Py Torch or any other software dependencies.
Experiment Setup	Yes	In all the experiments, we tune tune the step size in the grid {10^-5, 10^-4, 10^-3, 10^-2, 10^-1, 10^0, 10^1, 10^2, 10^3} and present the best result of each algorithm. ... We run F2BA (Algorithm 1) with K = 10, λ = 10^3... We set the number of inner loops as 10 for all the algorithms, and additionally tune λ from {10^1, 10^2, 10^3, 10^4} for penalty based methods F2BA, Acc F2BA, F2SA, and PBGD. For Acc F2BA, we set θ = 0.1 as commonly used in momentum-based gradient methods. ... We take γ = 0.99 in our experiments. ... For all the algorithms, we set batch size as 64, and a ﬁxed penalty with λ = 10^3.