reproducibilityindex.ai

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Authors: Bo Liu, Mao Ye, Stephen Wright, Peter Stone, Qiang Liu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance.
Researcher Affiliation	Collaboration	Bo Liu1 Mao Ye1 Stephen Wright2 Peter Stone1,3 Qiang Liu1 1The University of Texas at Austin 2University of Wisconsin-Madison 3 Sony AI
Pseudocode	Yes	Algorithm 1 Bilevel Optimization Made Easy (BOME!)
Open Source Code	Yes	3. (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets	Yes	For the dataset, we use MNIST [9] (Fashion MNIST [50]).
Dataset Splits	Yes	The stepsizes of all methods are set by a grid search from the set {0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000}. All toy problems adopt vanilla gradient descent (GD) and applications on hyperparameter optimization adapts GD with a momentum of 0.9.
Hardware Specification	No	The paper states in its checklist: "3. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]". No specific hardware details are mentioned in the main text.
Software Dependencies	No	The paper mentions using "Adam [26]" as an optimizer, but does not provide specific version numbers for any software dependencies, programming languages, or libraries like PyTorch, TensorFlow, Python, or CUDA.
Experiment Setup	Yes	Unless otherwise speciﬁed, BOME strictly follows Algorithm 1 with φk = krˆq(vk, k)k2, = 0.5, and T = 10. The inner stepsize is set to be the same as outer stepsize . The stepsizes of all methods are set by a grid search from the set {0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000}. All toy problems adopt vanilla gradient descent (GD) and applications on hyperparameter optimization adapts GD with a momentum of 0.9.