Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unlocking Global Optimality in Bilevel Optimization: A Pilot Study

Authors: Quan Xiao, Tianyi Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments corroborate the theoretical findings, demonstrating convergence to global minimum in both cases.
Researcher Affiliation	Academia	Quan Xiao Rensselaer Polytechnic Institute Troy, NY 12180, United States EMAIL Tianyi Chen Rensselaer Polytechnic Institute Troy, NY 12180, United States EMAIL
Pseudocode	Yes	Algorithm 1 PBGD in Jacobi fashion ... Algorithm 2 PBGD in Gauss-Seidel fashion
Open Source Code	No	The paper does not contain any explicit statements about the release of source code or links to a code repository.
Open Datasets	No	The paper describes generating synthetic datasets for its numerical experiments (e.g., "generate data matrix Xtrn RN m, Xval RN m from Gaussian distribution N(5, 0.01)" in sections H.1 and H.2), but does not provide concrete access information (links, DOIs, citations) to a publicly available or open dataset. The generated data itself is not stated to be made public.
Dataset Splits	Yes	H.1 Representation Learning: Considering the overparameterized and wide neural network case, we choose N = 30, N = 20, m = 40, n = 10, h = 300. H.2 Data Hyper-cleaning: Considering the overparameterized linear regression with a small clean validation dataset and a large dirty training dataset, we choose N = 100, N = 10, m = 200, n = 10.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments in the 'Numerical Experiments' section or elsewhere.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., library names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup	Yes	H.1 Representation Learning: ...we choose N = 30, N = 20, m = 40, n = 10, h = 300. First, we respectively generate data matrix Xtrn RN m, Xval RN m from Gaussian distribution N(5, 0.01) and N( 3, 0.01)... We select the best stepsizes α, β and the number of inner loop Tk = T by grid search. H.2 Data Hyper-cleaning: ...we choose N = 100, N = 10, m = 200, n = 10. First, we respectively generate data matrix Xtrn RN m, Xval RN m from Gaussian distribution N(5, 0.01) and N( 3, 0.01)...