Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Problem-Parameter-Free Decentralized Bilevel Optimization

Authors: Zhiwei Zhai, Wenjing Yan, Ying-Jun Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical experiments demonstrate that Ada SDBO delivers competitive performance compared to existing decentralized bilevel optimization methods while exhibiting remarkable robustness across diverse stepsize configurations. In this section, we evaluate the performance of Algorithm 1 on the hyperparameter optimization problem, as illustrated in Section E.1. Our algorithm is compared with several decentralized bilevel optimization methods, including SLDBO [Dong et al., 2023], MA-DSBO [Chen et al., 2023], MDBO [Gao et al., 2023], and DBO [Chen et al., 2024a]. Experiments are conducted on both synthetic and real-world datasets, with detailed configurations and additional results provided in Section E.
Researcher Affiliation Academia Zhiwei Zhai Wenjing Yan Ying-Jun Angela Zhang Department of Information Engineering The Chinese University of Hong Kong EMAIL
Pseudocode Yes Algorithm 1 Adaptive Single-Loop Decentralized Bilevel Optimization: Procedures at Each Agent i [n] 1: Initialization: xi,0, yi,0, vi,0, mx i,0 = my i,0 = mv i,0 > 0, γx = γy = γv > 0. 2: for t = 0, 1, , T 1 do 3: Compute the gradients: gy i,t = yli(xi,t, yi,t), gv i,t = y yli(xi,t, yi,t)vi,t yfi(xi,t, yi,t), gx i,t = xfi(xi,t, yi,t) x yli(xi,t, yi,t)vi,t. 4: Accumulate the gradient norms: [mx i,t+1]2 = [mx i,t]2 + gx i,t 2, [my i,t+1]2 = [my i,t]2 + gy i,t 2, [mv i,t+1]2 = [mv i,t]2 + gv i,t 2. 5: Update the primal, dual, and auxiliary variables by: yi,t+1 = yi,t γy my i,t+1 gy i,t, vi,t+1 = vi,t γv max(mv i,t+1,my i,t+1)gv i,t, xi,t+1 = xi,t γx mx i,t+1 max(mv i,t+1,my i,t+1)gx i,t. 6: Information exchange with neighbors: {x, y, v}i,t+1 P j wi,j{x, y, v}j,t+1, {mx, my, mv}i,t+1 P j wi,j{mx, my, mv}j,t+1. 7: Projection of auxiliary variable on the set V: vi,t+1 PV(vi,t+1). 8: end for
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The experimental results shown in the submitted manuscript do not depend on private datasets and can be reproduced by following the provided instructions and settings.
Open Datasets Yes We evaluate our method on the hyperparameter optimization task using the MNIST [Le Cun et al., 1998] and FMNIST [Xiao et al., 2017] datasets.
Dataset Splits Yes Di and D i represent the training and validation sets, respectively. The batch size for each computing agent is set to 1,000. Each task included a training dataset and a validation dataset, both configured for 5-way classification with 50 shots per class. Specifically, the training and validation data were distributed among different agents to enable cooperative learning. For each task, 30% of the data from the i-th class was assigned to agent i, while the remaining 70% was evenly distributed among the other agents.
Hardware Specification Yes All experiments were conducted on a host machine equipped with an Intel(R) Xeon(R) W9-3475X CPU running at 2.20 GHz (maximum turbo frequency: 4.80 GHz), featuring 36 physical cores and 72 threads. The system was configured with 256 GB of DDR5 ECC RAM and a single NVIDIA(R) RTX(TM) A6000 GPU with 48 GB of memory.
Software Dependencies No All experiments were performed with n = 5 using Py Torch [Paszke et al., 2019].
Experiment Setup Yes For all experiments, except for the test accuracy versus stepsize comparison, we use the following parameter settings. For the baseline methods SLDBO and MA-DSBO, the stepsizes for updating x and v are set to 0.01, while the stepsize for updating y is set to 0.02, following the optimal stepsize order described in [Dong et al., 2023, Chen et al., 2023]. For the baseline methods DBO and MDBO, the stepsizes for updating both x and y are set to 0.01. For Ada SDBO, we set γx = γy = γv = 1 and initialize mx i,0 = my i,0 = mv i,0 = 10, i [n].