First-Order Methods for Linearly Constrained Bilevel Optimization

Authors: Guy Kornowski, Swati Padmanabhan, Kai Wang, Zhe Zhang, Suvrit Sra

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments verify these guarantees.
Researcher Affiliation Academia Weizmann Institute of Science. guy.kornowski@weizmann.ac.il Massachusetts Institute of Technology. pswt@mit.edu Georgia Institute of Technology. kwang692@gatech.edu Purdue University. zhan5111@purdue.edu Massachusetts Institute of Technology. suvrit@mit.edu
Pseudocode Yes Algorithm 1 Inexact Gradient Oracle for Bilevel Program with Linear Equality Constraint
Open Source Code Yes The implementation can be found in https://github.com/guaguakai/constrained-bilevel-optimization.
Open Datasets No We generate instances of the following constrained bilevel optimization problem: minx c y + 0.01 x 2 + 0.01 y 2 s.t. y = arg miny:h(x,y) 0 1 2y Qy + x Py, (6.1) where h(x, y) = Ay b is a dh-dim linear constraint. The PSD matrix Q Rdy dy, c Rdy, P Rdx dy, and constraints A Rdh dy, b Rdh are randomly generated from normal distributions (cf. Appendix K).
Dataset Splits No The paper describes generating synthetic data instances but does not specify any training, validation, or test dataset splits, percentages, or absolute sample counts.
Hardware Specification Yes All experiments were run on a computing cluster with Dual Intel Xeon Gold 6226 CPUs @ 2.7 GHz and DDR4-2933 MHz DRAM. No GPU was used, and we used 1 core with 8GB RAM per instance of the experiment.
Software Dependencies No All algorithms are implemented in Py Torch [94] to compute gradients, and using Cvxpy [95] to solve the LL problem and the penalty minimization problem. (No version numbers provided for PyTorch or CVXPY)
Experiment Setup Yes Both algorithms use Adam [90] to control the learning rate, and are averaged over 10 random seeds. We run Algorithm 3 using Algorithm 4 on the bilevel optimization in the toy example in Problem L.1 with dx = 100, dy = 200, nconst = dy/5, and accuracy α = 0.1. The cutoff time for running the algorithms is set to be 6 hours.