Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling

Authors: Junyi Li, Heng Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate our algorithms over both synthetic and real-world applications. Numerical results clearly showcase the superiority of our algorithms. and Section 5 Applications and Numerical Experiments.
Researcher Affiliation Academia Junyi Li and Heng Huang Department of Computer Science, Institute of Health Computing University of Maryland College Park College Park, MD, 20742
Pseudocode Yes Algorithm 1 Without-Replacement Bilevel Optimization (Wi OR-BO), Algorithm 2 Without-Replacement Conditional Bilevel Optimization (Wi OR-CBO)
Open Source Code Yes the datasets used in experiments are publicly available and we include the code implementation in the supplementary material. (Under Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?)
Open Datasets Yes We construct datasets based on MNIST [29]. and We consider the Omniglot [27] and Mini Image Net [41] data sets.
Dataset Splits Yes For the training set, we randomly sample 40000 images from the original training dataset and then randomly perturb a fraction of labels of samples. For the validation set, we randomly select 5000 clean images from the original training dataset. and ...for each character, we sample K samples for training and 15 samples for validation.
Hardware Specification Yes Our experiments were conducted on servers equipped with 8 NVIDIA A5000 GPUs.
Software Dependencies No The code is written in Pytorch. (No version specified for Pytorch).
Experiment Setup Yes During trainging, we use both the inner and outer learing rates of 0.001. (A.1), We choose inner learning rate (γ, ρ) as 0.1 and outer learning rate η 1000. (A.2), For the experiments, we use inner learning rates 0.4 and outer learning rates 0.1 for Omniglot related experiments and inner learning rates 0.01 and outer learning rates 0.05 for Mini Image Net-related experiments. We perform 4 inner gradient descent steps and set Kmax = 6 for the RT-MLMC method. (A.3)