First-Order Methods for Linearly Constrained Bilevel Optimization
Authors: Guy Kornowski, Swati Padmanabhan, Kai Wang, Zhe Zhang, Suvrit Sra
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments verify these guarantees. |
| Researcher Affiliation | Academia | Weizmann Institute of Science. guy.kornowski@weizmann.ac.il Massachusetts Institute of Technology. pswt@mit.edu Georgia Institute of Technology. kwang692@gatech.edu Purdue University. zhan5111@purdue.edu Massachusetts Institute of Technology. suvrit@mit.edu |
| Pseudocode | Yes | Algorithm 1 Inexact Gradient Oracle for Bilevel Program with Linear Equality Constraint |
| Open Source Code | Yes | The implementation can be found in https://github.com/guaguakai/constrained-bilevel-optimization. |
| Open Datasets | No | We generate instances of the following constrained bilevel optimization problem: minx c y + 0.01 x 2 + 0.01 y 2 s.t. y = arg miny:h(x,y) 0 1 2y Qy + x Py, (6.1) where h(x, y) = Ay b is a dh-dim linear constraint. The PSD matrix Q Rdy dy, c Rdy, P Rdx dy, and constraints A Rdh dy, b Rdh are randomly generated from normal distributions (cf. Appendix K). |
| Dataset Splits | No | The paper describes generating synthetic data instances but does not specify any training, validation, or test dataset splits, percentages, or absolute sample counts. |
| Hardware Specification | Yes | All experiments were run on a computing cluster with Dual Intel Xeon Gold 6226 CPUs @ 2.7 GHz and DDR4-2933 MHz DRAM. No GPU was used, and we used 1 core with 8GB RAM per instance of the experiment. |
| Software Dependencies | No | All algorithms are implemented in Py Torch [94] to compute gradients, and using Cvxpy [95] to solve the LL problem and the penalty minimization problem. (No version numbers provided for PyTorch or CVXPY) |
| Experiment Setup | Yes | Both algorithms use Adam [90] to control the learning rate, and are averaged over 10 random seeds. We run Algorithm 3 using Algorithm 4 on the bilevel optimization in the toy example in Problem L.1 with dx = 100, dy = 200, nconst = dy/5, and accuracy α = 0.1. The cutoff time for running the algorithms is set to be 6 hours. |