Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach
Authors: Prashant Khanduri, Ioannis Tsaknakis, Yihua Zhang, Jia Liu, Sijia Liu, Jiawei Zhang, Mingyi Hong
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we experimentally corroborate the theoretical findings and evaluate the performance of the proposed framework on numerical and adversarial learning problems. |
| Researcher Affiliation | Academia | 1Department of CS, Wayne State University, Detroit, MI 48202, USA 2Department of ECE, University of Minnesota, Minneapolis, MN 55455, USA 3Department of CSE, Michigan State University, East Lansing, MI 48824, USA 4Department of ECE, The Ohio State University, Columbus, OH 43210, USA 5Laboratory for Information & Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. |
| Pseudocode | Yes | Algorithm 1 [Deterministic] Smoothed Implicit Gradient Descent ([D]SIGD), Algorithm 2 [Stochastic] Smoothed Implicit Gradient Descent ([S]SIGD), Algorithm 3 Projected Gradient Descent (PGD) |
| Open Source Code | Yes | The code can be found in the following link: https://anonymous.4open.science/r/icml23-bilevel-gaussian/ |
| Open Datasets | Yes | We consider two representative datasets CIFAR-10/100 (Krizhevsky et al., 2009) and adopt the Res Net-18 (He et al., 2016) model; the results for CIFAR-10 are provided in Appendix C. |
| Dataset Splits | No | The paper mentions 'training set' and 'test set' but does not provide specific percentages, counts, or explicit methodology for training/validation/test splits. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions models and baselines but does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiments. |
| Experiment Setup | Yes | In the first two cases, we solve the inner-level problem using 10 steps of projected gradient descent with stepsize 10 1. For the stepsize of [S]SIGD, we choose β = 0.1, while in [D]SIGD we find the proper Armijo step-size by successively adapting (by increasing m) the quantity ar = (0.9)m until condition (12) is met. In PDBO we select 10 1 for the stepsizes of both the primal and dual steps, and the number of inner iterations is set to 10. and In the implementation of our [S]SIGD method, we adopt a perturbation generated by a Gaussian random vector q with variances from the following list σ2 {2e 5, 4e 5, 6e 5, 8e 5, 1e 4, }, in order to study different levels of smoothness. We choose fi to be cross-entropy loss and hi = fi + λ yi 2 for hyper-parameter λ > 0. |