An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

Authors: Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate our AGM-Bi O method on two different bilevel problems using real and synthetic datasets. We compare its runtime and iteration count with other methods, including a-IRG [12], CG-Bi O [6], Bi-SG [13], SEA [14], R-APM [8], PB-APG [16], and Bisec-Bi O [15].
Researcher Affiliation Academia Jincheng Cao ECE Department UT Austin jinchengcao@utexas.edu; Ruichen Jiang ECE Department UT Austin rjiang@utexas.edu; Erfan Yazdandoost Hamedani SIE Department The University of Arizona erfany@arizona.edu; Aryan Mokhtari ECE Department UT Austin mokhtari@austin.utexas.edu
Pseudocode Yes Algorithm 1 Accelerated Gradient Method for Bilevel Optimization (AGM-Bi O); Algorithm 2 Proximal Accelerated Gradient Method for Bilevel Optimization (P-AGM-Bi O)
Open Source Code Yes The code and data are attached in the supplementary material.
Open Datasets Yes We apply the Wikipedia Math Essential dataset [34] which is composed of a data matrix A Rn d with n = 1068 samples and d = 730 features and an output vector b Rn.
Dataset Splits Yes We use 75% of the dataset as the training set (Atr, btr) and 25% as the validation set (Aval, bval).
Hardware Specification Yes All simulations are implemented using MATLAB R2022a on a PC running mac OS Sonoma with an Apple M1 Pro chip and 16GB Memory.
Software Dependencies Yes All simulations are implemented using MATLAB R2022a on a PC running mac OS Sonoma with an Apple M1 Pro chip and 16GB Memory.
Experiment Setup Yes For our AGM-Bi O method, we set the target tolerances for the absolute suboptimality and infeasibility to ϵf = 10^-4 and ϵg = 10^-4, respectively. We choose the stepsizes as ak = 10^-2(k + 1)/(4Lf). In each iteration, we need to do a projection onto an intersection of a L2-ball and a halfspace, which has a closed-form solution. For a-IRG, we set η0 = 10^-3 and γ0 = 10^-3. For CG-Bi O, we obtain an initial point with FW gap of the lower-level problem less than ϵg/2 = 5 × 10^-5 and choose stepsize γk = 10^-2/(k + 2).