Convergence of Bayesian Bilevel Optimization

Authors: Shi Fu, Fengxiang He, Xinmei Tian, Dacheng Tao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper presents the first theoretical guarantee for Bayesian bilevel optimization (BBO) that we term for the prevalent bilevel framework combining Bayesian optimization at the outer level to tune hyperparameters, and the inner-level stochastic gradient descent (SGD) for training the model. We prove sublinear regret bounds suggesting simultaneous convergence of the inner-level model parameters and outer-level hyperparameters to optimal configurations for generalization capability. A pivotal, technical novelty in the proofs is modeling the excess risk of the SGDtrained parameters as evaluation noise during Bayesian optimization. Our theory implies the inner unit horizon, defined as the number of SGD iterations, shapes the convergence behavior of BBO. This suggests practical guidance on configuring the inner unit horizon to enhance training efficiency and model performance. (...) 5 EXPERIMENTS We conducted numerical results in this section. In the inner level, we employ SGD to train a CNN with two convolutional layers and one fully connected layer on the MNIST dataset. In the outer level, Bayesian Optimization uses the EI and UCB functions to adjust hyperparameters like the learning rate. We fix the number of iterations for the outer-level BO and compare the number of iterations for the inner-level SGD under different scenarios, along with their respective convergence outcomes, as detailed in the table below.
Researcher Affiliation Academia Shi Fu1 Fengxiang He2 Xinmei Tian1 Dacheng Tao3 1University of Science and Technology of China, 2University of Edinburgh, 3Nanyang Technological University
Pseudocode Yes A BAYESIAN BILEVEL OPTIMIZATION ALGORITHMS (...) Algorithm 1: Bayesian bilevel Optimization with EI acquisition (...) Algorithm 2: Bayesian bilevel optimization with UCB acquisition
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes In the inner level, we employ SGD to train a CNN with two convolutional layers and one fully connected layer on the MNIST dataset.
Dataset Splits No The paper mentions 'training dataset' (Str) and 'validation set' (Sval) and states 'Choose N n m' (Theorem 1), where n and m are the sizes of these sets. However, it does not provide specific numerical percentages or counts for these splits relative to the total dataset, nor does it refer to predefined splits for MNIST that would allow reproduction of the data partitioning.
Hardware Specification No The paper states 'We conducted numerical results in this section.' (Section 5) but does not provide any specific details about the hardware used, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions using SGD, CNN, Bayesian Optimization with EI and UCB functions, and the MNIST dataset. However, it does not specify any software versions for libraries like PyTorch, TensorFlow, scikit-learn, or programming languages (e.g., Python 3.x).
Experiment Setup Yes In the inner level, we employ SGD to train a CNN with two convolutional layers and one fully connected layer on the MNIST dataset. In the outer level, Bayesian Optimization uses the EI and UCB functions to adjust hyperparameters like the learning rate. We fix the number of iterations for the outer-level BO and compare the number of iterations for the inner-level SGD under different scenarios, along with their respective convergence outcomes, as detailed in the table below. (1) Setting the number of outer Bayesian optimization steps as 20. SGD Iteration 100 500 1,000 2,000 3,000 Performance 3.15 ± 0.75 2.73 ± 0.03 2.70 ± 0.04 2.43 ± 0.12 2.46 ± 0.13