Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem

Authors: Jincheng Cao, Ruichen Jiang, Nazanin Abolfazli, Erfan Yazdandoost Hamedani, Aryan Mokhtari

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we test our methods on two different stochastic bilevel optimization problems with real and synthetic datasets and compare them with other existing stochastic methods in [16] and [13]. In Figure 1(a)(b), we observe that SBCGF maintains a smaller lower-level gap than other methods and converges faster than the rest in terms of upper-level error.
Researcher Affiliation Academia Jincheng Cao ECE Department UT Austin jinchengcao@utexas.edu Ruichen Jiang ECE Department UT Austin rjiang@utexas.edu Nazanin Abolfazli SIE Department The University of Arizona nazaninabolfazli@arizona.edu Erfan Yazdandoost Hamedani SIE Department The University of Arizona erfany@arizona.edu Aryan Mokhtari ECE Department UT Austin mokhtari@austin.utexas.edu
Pseudocode Yes Algorithm 1 SBCGI
Open Source Code No The paper mentions using third-party tools like 'CVX [41, 42]' and 'MATLAB s root-finding solver', but there is no explicit statement or link indicating that the authors have made their own implementation code for the described methodology publicly available.
Open Datasets Yes We apply the Wikipedia Math Essential dataset [30] which composes of a data matrix A Rn d with n = 1068 samples and d = 730 features and an output vector b Rn. [30] Benedek Rozemberczki, Paul Scherer, Yixuan He, George Panagopoulos, Alexander Riedel, Maria Astefanoaei, Oliver Kiss, Ferenc Beres, Guzmán López, Nicolas Collignon, et al. Pytorch geometric temporal: Spatiotemporal signal processing with neural machine learning models. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 4564 4573, 2021.
Dataset Splits Yes To ensure the problem is over-parameterized, we assign 1/3 of the dataset as the training set (Atr, btr), 1/3 as the validation set (Aval , bval ) and the remaining 1/3 as the test set (Atest , btest ).
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'CVX [41, 42]' and 'MATLAB s root-finding solver' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We query stochastic oracle 9 105 times with stepsize γt = 0.01/(t + 1) and γ = 10 5 for SBCGI 1 and SBCGF 2 with Kt = 10 4/ t + 1, respectively. For a R-IP-Se G, we choose γ0 = 10 7 and ρ0 = 103. For DBGD, we set α = β = 1 and γt = 10 6.