Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles
Authors: Lesi Chen, Yaohua Ma, Jingzhao Zhang
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to showcase the superiority of our proposed methods. We implement the algorithms using Py Torch (Paszke et al., 2019). In all the experiments, we tune tune the step size in the grid {10^-5, 10^-4, 10^-3, 10^-2, 10^-1, 10^0, 10^1, 10^2, 10^3} and present the best result of each algorithm. Figure 1: Numerical verification of F2BA on Problem (10). Figure 2: Comparison of deterministic algorithms on Problem (11). Our proposed algorithms F2BA as well as its acceleration Acc F2BA outperform baselines. Figure 3: Comparison of stochastic algorithms on Problem (12) under different corruption ratios p. Due to the use of two-time-scale step size, our proposed F2BSA significantly outperforms single-time-scale baselines PBGD and F2SA. |
| Researcher Affiliation | Academia | Lesi Chen* EMAIL IIIS, Tsinghua University Shanghai Qi Zhi Institute Beijing, China Yaohua Ma* EMAIL IIIS, Tsinghua University Beijing, China Jingzhao Zhang EMAIL IIIS, Tsinghua University Shanghai AI Lab Shanghai Qi Zhi Institute Beijing, China |
| Pseudocode | Yes | Algorithm 1 F2BA (x0, y0) Algorithm 2 F2BSA (x0, y0) Algorithm 3 Perturbed F2BA (x0, y0) Algorithm 4 Accelerated F2BA(x0, y0) Algorithm 5 F3BSA(x, y0) Algorithm 6 MLMC-Hyper Grad Est(x, y0, z0, φf, φg, M, N, S) Algorithm 7 Hyper Grad Est(x, y0, z0, φf, φg, N, K) Algorithm 8 SGD (h, x0, η, K) Algorithm 9 SGDSC (h, x0, N, K) Algorithm 10 Inexact Perturbed Gradient Descent Algorithm 11 Inexact Nonconvex Accelerated Gradient Descent Algorithm 12 Distributed F2BA ( x0, y0, ηx, ηy, ηz, λ, T, K) |
| Open Source Code | No | The paper does not provide an explicit statement or a link to a code repository for their methods. It mentions that they implement the algorithms using Py Torch, but this refers to the use of a third-party library, not the release of their own source code. |
| Open Datasets | Yes | We use the abalone dataset1, which contains 4,177 samples and each sample has 8 features (p = 8). 1. Dataset available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ ... We conduct the experiments on the 20 newsgroup dataset which is commonly used by previous works (Grazzi et al., 2020; Ji et al., 2021). ... In our experiment, we use the Alpaca dataset (Taori et al., 2023), which contains 52k instructions and demonstration outputs generated by Open AI s text-davinci-003 engine. ... Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023. |
| Dataset Splits | Yes | We split the dataset into the training set and the validation set in a 7:3 ratio. ... We split the dataset into a training set and a validation set in an 8 : 2 ratio |
| Hardware Specification | Yes | We run the experiments on 8 A40 GPUs. |
| Software Dependencies | No | The paper states: 'We implement the algorithms using Py Torch (Paszke et al., 2019).' However, it does not specify a version number for Py Torch or any other software dependencies. |
| Experiment Setup | Yes | In all the experiments, we tune tune the step size in the grid {10^-5, 10^-4, 10^-3, 10^-2, 10^-1, 10^0, 10^1, 10^2, 10^3} and present the best result of each algorithm. ... We run F2BA (Algorithm 1) with K = 10, λ = 10^3... We set the number of inner loops as 10 for all the algorithms, and additionally tune λ from {10^1, 10^2, 10^3, 10^4} for penalty based methods F2BA, Acc F2BA, F2SA, and PBGD. For Acc F2BA, we set θ = 0.1 as commonly used in momentum-based gradient methods. ... We take γ = 0.99 in our experiments. ... For all the algorithms, we set batch size as 64, and a fixed penalty with λ = 10^3. |