On the Convergence Theory for Hessian-Free Bilevel Algorithms
Authors: Daouda Sow, Kaiyi Ji, Yingbin Liang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we demonstrate that the proposed algorithms outperform baseline bilevel optimizers on various bilevel problems. Particularly, in our experiment on few-shot meta-learning with Res Net-12 network over the mini Image Net dataset, we show that our algorithm outperforms baseline meta-learning algorithms, while other baseline bilevel optimizers do not solve such meta-learning problems within a comparable time frame. 4 Experiments We validate our algorithms in four bilevel problems: shallow hyper-representation (HR) with linear/2-layer net embedding model on synthetic data, deep HR with Le Net network [32] on MNIST dataset, few-shot meta-learning with Res Net-12 on mini Image Net dataset, and hyperparameter optimization (HO) on the 20 Newsgroup dataset. |
| Researcher Affiliation | Academia | Daouda A. Sow Department of ECE The Ohio State University sow.53@osu.edu Kaiyi Ji Department of CSE University at Buffalo kaiyiji@buffalo.edu Yingbin Liang Department of ECE The Ohio State University liang.889@osu.edu |
| Pseudocode | Yes | Algorithm 1 Partial Zeroth-Order-like Bilevel Optimizer (PZOBO) |
| Open Source Code | No | The paper does not include an explicit statement or link to its own open-source code for the methodology described. |
| Open Datasets | Yes | few-shot meta-learning with Res Net-12 network over the mini Image Net dataset, deep HR with Le Net network [32] on MNIST dataset, hyperparameter optimization (HO) on the 20 Newsgroup dataset. |
| Dataset Splits | Yes | where X2 Rn2 m and X1 Rn1 m are matrices of synthesized training and validation data, and Y2 Rn2, Y1 Rn1 are the corresponding response vectors. and hyperparameter optimization (HO) is the problem of finding the set of the best hyperparamters (either representational or regularization parameters) that yield the optimal value of some criterion of model quality (usually a validation loss on unseen data). |
| Hardware Specification | Yes | We run all models using a single NVIDIA Tesla P100 GPU. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | We compare our PZOBO algorithm with the baseline bilevel optimizers AID-FP, AID-CG, ITD-R, and HOZOG (see Appendix E.1 for details about the baseline algorithms and hyperparameters used). and The dataset and hyperparameter details can be found in Appendix E.4. |