An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization
Authors: Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate our AGM-Bi O method on two different bilevel problems using real and synthetic datasets. We compare its runtime and iteration count with other methods, including a-IRG [12], CG-Bi O [6], Bi-SG [13], SEA [14], R-APM [8], PB-APG [16], and Bisec-Bi O [15]. |
| Researcher Affiliation | Academia | Jincheng Cao ECE Department UT Austin jinchengcao@utexas.edu; Ruichen Jiang ECE Department UT Austin rjiang@utexas.edu; Erfan Yazdandoost Hamedani SIE Department The University of Arizona erfany@arizona.edu; Aryan Mokhtari ECE Department UT Austin mokhtari@austin.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Accelerated Gradient Method for Bilevel Optimization (AGM-Bi O); Algorithm 2 Proximal Accelerated Gradient Method for Bilevel Optimization (P-AGM-Bi O) |
| Open Source Code | Yes | The code and data are attached in the supplementary material. |
| Open Datasets | Yes | We apply the Wikipedia Math Essential dataset [34] which is composed of a data matrix A Rn d with n = 1068 samples and d = 730 features and an output vector b Rn. |
| Dataset Splits | Yes | We use 75% of the dataset as the training set (Atr, btr) and 25% as the validation set (Aval, bval). |
| Hardware Specification | Yes | All simulations are implemented using MATLAB R2022a on a PC running mac OS Sonoma with an Apple M1 Pro chip and 16GB Memory. |
| Software Dependencies | Yes | All simulations are implemented using MATLAB R2022a on a PC running mac OS Sonoma with an Apple M1 Pro chip and 16GB Memory. |
| Experiment Setup | Yes | For our AGM-Bi O method, we set the target tolerances for the absolute suboptimality and infeasibility to ϵf = 10^-4 and ϵg = 10^-4, respectively. We choose the stepsizes as ak = 10^-2(k + 1)/(4Lf). In each iteration, we need to do a projection onto an intersection of a L2-ball and a halfspace, which has a closed-form solution. For a-IRG, we set η0 = 10^-3 and γ0 = 10^-3. For CG-Bi O, we obtain an initial point with FW gap of the lower-level problem less than ϵg/2 = 5 × 10^-5 and choose stepsize γk = 10^-2/(k + 2). |