Non-Convex Bilevel Optimization with Time-Varying Objective Functions

Authors: Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, Ness Shroff

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across multiple domains corroborate the effectiveness of SOBOW. In this section, we conduct experiments in multiple domains to corroborate the utility of the OBO framework and the effectiveness of SOBOW.
Researcher Affiliation Academia Sen Lin Department of CS University of Houston slin50@central.uh.edu Daouda Sow Department of ECE The Ohio State University sow.53@osu.edu Kaiyi Ji Department of CSE University at Buffalo kaiyiji@buffalo.edu Yingbin Liang Department of ECE The Ohio State University liang889@osu.edu Ness Shroff Department of ECE & CSE The Ohio State University shroff.11@osu.edu
Pseudocode Yes Algorithm 1 General procedure of OBO and Algorithm 2 Single-loop Online Bilevel Optimizer with Window averaging (SOBOW)
Open Source Code No The paper does not provide a specific link or explicit statement about releasing the source code for the described methodology.
Open Datasets No We consider an online classification setting on the 20 Newsgroup dataset, where the classifier is modeled by an affine transformation and we use the cross-entropy loss as the losscost function. The paper mentions the dataset by name but does not provide a specific link, DOI, repository, or a formal citation with authors and year for its access.
Dataset Splits Yes Specifically, at each online round t, the agent applies the hyperparameters λt and the model wt, and then receives a small dataset Dt = {Dtr t , Dval t } composed of a training subset Dtr t and a validation subset Dval t .
Hardware Specification No The paper discusses computational efficiency and running time (e.g., 'SOBOW takes 11 seconds, OAGD takes 228 seconds and OGD takes 7 seconds'), but does not provide specific details about the hardware (e.g., CPU, GPU models) used for these experiments.
Software Dependencies No The paper describes algorithms and methods used (e.g., 'implicit differentiation with the fixed point method'), but does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes For example, we achieve best performance by setting both the inner and outer stepsizes to 10-4 for the online hyper-representation learning experiments and small values around that scale yield the same performance. For the dynamic OHO experiments, only the outer step size is set manually to 0.01. The inner step size is optimized along with the other regularization hyperparameters. For both setups the batchsize is fixed to 16.