A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Authors: Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Numerical experiments In this section, we evaluate the performance of the SUSTAIN algorithm on two popular machine learning tasks: hyperparameter optimization and meta learning. In Figure 1, we compare the performance of different algorithms when the dataset has a corruption probability of 0.3. As observed, SUSTAIN outperforms stoc Bi O and HOAG.
Researcher Affiliation Academia Prashant Khanduri University of Minnesota khand095@umn.edu Siliang Zeng University of Minnesota zeng0176@umn.edu Mingyi Hong University of Minnesota mhong@umn.edu CUHK htwai@se.cuhk.edu.hk Zhaoran Wang Northwestern University zhaoranwang@gmail.com Zhuoran Yang Princeton University zy6@princeton.edu
Pseudocode Yes Algorithm 1 The Proposed SUSTAIN Algorithm
Open Source Code No The paper uses the learn2learn library (available: https://github.com/learnables/ learn2learn) which is a third-party tool. There is no explicit statement or link indicating that the authors' own code for the SUSTAIN algorithm or experiments is made publicly available.
Open Datasets Yes The problem is trained on the Fashion MNIST dataset [41] with 50k, 10k, and 10k image samples allocated for training, validation and testing purposes, respectively. We consider a few-shot meta learning problem [11, 30] (cf. (2)) and compare the performance of SUSTAIN to ITD-Bi O [19] and ANIL [30]. The task of interest is 5-way 5-shot learning and we conduct experiments on the mini Image Net dataset [39, 32] with 100 classes and 600 images per class.
Dataset Splits Yes The problem is trained on the Fashion MNIST dataset [41] with 50k, 10k, and 10k image samples allocated for training, validation and testing purposes, respectively. We apply learn2learn [1] (available: https://github.com/learnables/ learn2learn) to partition the 100 classes from mini Image Net into subsets of 64, 16 and 20 for meta training, meta validation and meta testing, respectively.
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'learn2learn' and implementing a '4-layer convolutional neural network (CNN) with Re LU activation' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes The step sizes for different algorithms are chosen according to their theoretically suggested values. Let the outer iteration be indexed by t, for SUSTAIN we choose t = βt = 0.1/(1 + t)1/3 and tune for c f and c g (see Theorem 3.2), for stoc Bi O and HOAG we select t = d , βt = dβ and tune for parameters d and d in the range [0, 1]. For ANIL and ITD-Bi O, we use the parameter selection suggested in [1, 19]. Specifically, for ANIL, we use inner-loop stepsize of 0.1 and the outer-loop (meta) stepsize as 0.002. For ITD-Bi O, we choose the inner-loop stepsize as 0.05 and the outer-loop stepsize to be 0.005. For SUSTAIN, we choose the outer-loop stepsize t as /(1 + t)1/3 and choose 2 [0.1, 1], we choose the momentum parameter t as c 2 t / 2 and tune for c 2 {2, 5, 10, 15, 20}, finally, we fix the inner stepsize as 0.05.