BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach
Authors: Bo Liu, Mao Ye, Stephen Wright, Peter Stone, Qiang Liu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance. |
| Researcher Affiliation | Collaboration | Bo Liu1 Mao Ye1 Stephen Wright2 Peter Stone1,3 Qiang Liu1 1The University of Texas at Austin 2University of Wisconsin-Madison 3 Sony AI |
| Pseudocode | Yes | Algorithm 1 Bilevel Optimization Made Easy (BOME!) |
| Open Source Code | Yes | 3. (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | For the dataset, we use MNIST [9] (Fashion MNIST [50]). |
| Dataset Splits | Yes | The stepsizes of all methods are set by a grid search from the set {0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000}. All toy problems adopt vanilla gradient descent (GD) and applications on hyperparameter optimization adapts GD with a momentum of 0.9. |
| Hardware Specification | No | The paper states in its checklist: "3. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]". No specific hardware details are mentioned in the main text. |
| Software Dependencies | No | The paper mentions using "Adam [26]" as an optimizer, but does not provide specific version numbers for any software dependencies, programming languages, or libraries like PyTorch, TensorFlow, Python, or CUDA. |
| Experiment Setup | Yes | Unless otherwise specified, BOME strictly follows Algorithm 1 with φk = krˆq(vk, k)k2, = 0.5, and T = 10. The inner stepsize is set to be the same as outer stepsize . The stepsizes of all methods are set by a grid search from the set {0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000}. All toy problems adopt vanilla gradient descent (GD) and applications on hyperparameter optimization adapts GD with a momentum of 0.9. |