Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling
Authors: Junyi Li, Heng Huang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our algorithms over both synthetic and real-world applications. Numerical results clearly showcase the superiority of our algorithms. and Section 5 Applications and Numerical Experiments. |
| Researcher Affiliation | Academia | Junyi Li and Heng Huang Department of Computer Science, Institute of Health Computing University of Maryland College Park College Park, MD, 20742 |
| Pseudocode | Yes | Algorithm 1 Without-Replacement Bilevel Optimization (Wi OR-BO), Algorithm 2 Without-Replacement Conditional Bilevel Optimization (Wi OR-CBO) |
| Open Source Code | Yes | the datasets used in experiments are publicly available and we include the code implementation in the supplementary material. (Under Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?) |
| Open Datasets | Yes | We construct datasets based on MNIST [29]. and We consider the Omniglot [27] and Mini Image Net [41] data sets. |
| Dataset Splits | Yes | For the training set, we randomly sample 40000 images from the original training dataset and then randomly perturb a fraction of labels of samples. For the validation set, we randomly select 5000 clean images from the original training dataset. and ...for each character, we sample K samples for training and 15 samples for validation. |
| Hardware Specification | Yes | Our experiments were conducted on servers equipped with 8 NVIDIA A5000 GPUs. |
| Software Dependencies | No | The code is written in Pytorch. (No version specified for Pytorch). |
| Experiment Setup | Yes | During trainging, we use both the inner and outer learing rates of 0.001. (A.1), We choose inner learning rate (γ, ρ) as 0.1 and outer learning rate η 1000. (A.2), For the experiments, we use inner learning rates 0.4 and outer learning rates 0.1 for Omniglot related experiments and inner learning rates 0.01 and outer learning rates 0.05 for Mini Image Net-related experiments. We perform 4 inner gradient descent steps and set Kmax = 6 for the RT-MLMC method. (A.3) |