A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum
Authors: Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Numerical experiments In this section, we evaluate the performance of the SUSTAIN algorithm on two popular machine learning tasks: hyperparameter optimization and meta learning. In Figure 1, we compare the performance of different algorithms when the dataset has a corruption probability of 0.3. As observed, SUSTAIN outperforms stoc Bi O and HOAG. |
| Researcher Affiliation | Academia | Prashant Khanduri University of Minnesota khand095@umn.edu Siliang Zeng University of Minnesota zeng0176@umn.edu Mingyi Hong University of Minnesota mhong@umn.edu CUHK htwai@se.cuhk.edu.hk Zhaoran Wang Northwestern University zhaoranwang@gmail.com Zhuoran Yang Princeton University zy6@princeton.edu |
| Pseudocode | Yes | Algorithm 1 The Proposed SUSTAIN Algorithm |
| Open Source Code | No | The paper uses the learn2learn library (available: https://github.com/learnables/ learn2learn) which is a third-party tool. There is no explicit statement or link indicating that the authors' own code for the SUSTAIN algorithm or experiments is made publicly available. |
| Open Datasets | Yes | The problem is trained on the Fashion MNIST dataset [41] with 50k, 10k, and 10k image samples allocated for training, validation and testing purposes, respectively. We consider a few-shot meta learning problem [11, 30] (cf. (2)) and compare the performance of SUSTAIN to ITD-Bi O [19] and ANIL [30]. The task of interest is 5-way 5-shot learning and we conduct experiments on the mini Image Net dataset [39, 32] with 100 classes and 600 images per class. |
| Dataset Splits | Yes | The problem is trained on the Fashion MNIST dataset [41] with 50k, 10k, and 10k image samples allocated for training, validation and testing purposes, respectively. We apply learn2learn [1] (available: https://github.com/learnables/ learn2learn) to partition the 100 classes from mini Image Net into subsets of 64, 16 and 20 for meta training, meta validation and meta testing, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'learn2learn' and implementing a '4-layer convolutional neural network (CNN) with Re LU activation' but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The step sizes for different algorithms are chosen according to their theoretically suggested values. Let the outer iteration be indexed by t, for SUSTAIN we choose t = βt = 0.1/(1 + t)1/3 and tune for c f and c g (see Theorem 3.2), for stoc Bi O and HOAG we select t = d , βt = dβ and tune for parameters d and d in the range [0, 1]. For ANIL and ITD-Bi O, we use the parameter selection suggested in [1, 19]. Specifically, for ANIL, we use inner-loop stepsize of 0.1 and the outer-loop (meta) stepsize as 0.002. For ITD-Bi O, we choose the inner-loop stepsize as 0.05 and the outer-loop stepsize to be 0.005. For SUSTAIN, we choose the outer-loop stepsize t as /(1 + t)1/3 and choose 2 [0.1, 1], we choose the momentum parameter t as c 2 t / 2 and tune for c 2 {2, 5, 10, 15, 20}, finally, we fix the inner stepsize as 0.05. |