BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization

Authors: Chen Fan, Gaspard Choné-Ducasse, Mark Schmidt, Christos Thrampoulidis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, our extensive experiments demonstrate that the new algorithms, which are available in both SGD and Adam versions, can find large learning rates with minimal tuning and converge faster than corresponding vanilla SGD or Adam BO algorithms that require fine-tuning.
Researcher Affiliation Academia 1University of British Columbia 2Ecole Normale Supérieure 3Canada CIFAR AI Chair (Amii)
Pseudocode Yes Algorithm 1 Bi SLS-Adam/SGD Algorithm 2 reset
Open Source Code No The paper does not contain an explicit statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes The experiments are performed on MNIST dataset using LeNet [26, 42]. ... Binary linear classification on w8a dataset using logistic loss [3].
Dataset Splits Yes Validation loss against upper-level iterations for different values of β (left, α = 0.005) and α (right, β = 0.01). ... where (X1, Y1) and (X2, Y2) are validation and training data sets with sizes DX1 and DX2, respectively;
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models. It generally refers to computation without hardware specifics.
Software Dependencies No The paper mentions implicit use of frameworks common in deep learning, such as Adam and SGD variants, but it does not specify versions for any software components.
Experiment Setup Yes For constant-step SGD and Adam, we tune the lower-level learning rate β {10.0, 5.0, 1.0, 0.5, 0.1, 0.05, 0.01}. For the upper-level learning rate, we tune α {0.001, 0.0025, 0.005, 0.01, 0.05, 0.1} for SGD, and α {10-5, 5 10-5, 10-4, 5 10-4, 0.001, 0.01} for Adam