Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach

Authors: Prashant Khanduri, Ioannis Tsaknakis, Yihua Zhang, Jia Liu, Sijia Liu, Jiawei Zhang, Mingyi Hong

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we experimentally corroborate the theoretical findings and evaluate the performance of the proposed framework on numerical and adversarial learning problems.
Researcher Affiliation Academia 1Department of CS, Wayne State University, Detroit, MI 48202, USA 2Department of ECE, University of Minnesota, Minneapolis, MN 55455, USA 3Department of CSE, Michigan State University, East Lansing, MI 48824, USA 4Department of ECE, The Ohio State University, Columbus, OH 43210, USA 5Laboratory for Information & Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Pseudocode Yes Algorithm 1 [Deterministic] Smoothed Implicit Gradient Descent ([D]SIGD), Algorithm 2 [Stochastic] Smoothed Implicit Gradient Descent ([S]SIGD), Algorithm 3 Projected Gradient Descent (PGD)
Open Source Code Yes The code can be found in the following link: https://anonymous.4open.science/r/icml23-bilevel-gaussian/
Open Datasets Yes We consider two representative datasets CIFAR-10/100 (Krizhevsky et al., 2009) and adopt the Res Net-18 (He et al., 2016) model; the results for CIFAR-10 are provided in Appendix C.
Dataset Splits No The paper mentions 'training set' and 'test set' but does not provide specific percentages, counts, or explicit methodology for training/validation/test splits.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions models and baselines but does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiments.
Experiment Setup Yes In the first two cases, we solve the inner-level problem using 10 steps of projected gradient descent with stepsize 10 1. For the stepsize of [S]SIGD, we choose β = 0.1, while in [D]SIGD we find the proper Armijo step-size by successively adapting (by increasing m) the quantity ar = (0.9)m until condition (12) is met. In PDBO we select 10 1 for the stepsizes of both the primal and dual steps, and the number of inner iterations is set to 10. and In the implementation of our [S]SIGD method, we adopt a perturbation generated by a Gaussian random vector q with variances from the following list σ2 {2e 5, 4e 5, 6e 5, 8e 5, 1e 4, }, in order to study different levels of smoothness. We choose fi to be cross-entropy loss and hi = fi + λ yi 2 for hyper-parameter λ > 0.