Will Bilevel Optimizers Benefit from Loops

Authors: Kaiyi Ji, Mingrui Liu, Yingbin Liang, Lei Ying

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Empirical Verification Experiments on hyperparameter optimization on MNIST. We first conduct experiments to verify our theoretical results in Corollaries 1, 2, 3 and 4 on AID-Bi O with different implementations. We consider the following hyperparameter optimization problem. Figure 1: Training & test losses v.s. time (seconds) by AID-Bi O on MNIST with different Q and N.
Researcher Affiliation Academia Kaiyi Ji Department of CSE University at Buffalo kaiyiji@buffalo.edu Mingrui Liu Department of CS George Mason University mingruil@gmu.edu Yingbin Liang Department of ECE The Ohio State University liang.889@osu.edu Lei Ying Department of EECS University of Michigan leiying@umich.edu
Pseudocode Yes Algorithm 1 AID-based bilevel optimization (AID-Bi O) with double warm starts... Algorithm 2 ITD-based bilevel optimization algorithm (ITD-Bi O) with warm start
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes Experiments on hyperparameter optimization on MNIST... The datasets we use are public. See Appendix E for details.
Dataset Splits No The paper mentions 'Dval' for validation dataset and 'Dtr' for training dataset in Section 6, but does not provide specific split percentages or sample counts for these datasets in the main text.
Hardware Specification No The main text of the paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies No The main text of the paper does not specify software dependencies with version numbers (e.g., specific libraries or frameworks like Python, PyTorch, or TensorFlow versions).
Experiment Setup Yes In Figure 2, ... we choose loop sizes Q and N from {1, 50}. In Figure 3, ... with different choices of N from {1, 20}. For ITD-Bi O, we choose N = 20 for N-N-loop ITD and N = 1 for No-loop ITD. The results are reported with the best-tuned hyperparameters.