Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Authors: Fan Bao, Guoqiang Wu, Chongxuan LI, Jun Zhu, Bo Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments on feature learning and data reweighting for noisy labels, we corroborate our theoretical findings.
Researcher Affiliation Academia Fan Bao , Guoqiang Wu , Chongxuan Li , Jun Zhu , Bo Zhang Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua-Huawei Joint Center for AI BNRist Center, State Key Lab for Intell. Tech. & Sys., Tsinghua University, Beijing, China bf19@mails.tsinghua.edu.cn,{guoqiangwu90, chongxuanli1991}@gmail.com, {dcszj, dcszb}@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Unrolled differentiation for hyperparameter optimization; Algorithm 2 Cross-validation for hyperparameter optimization
Open Source Code Yes See https://github.com/baofff/stability_ho.
Open Datasets Yes In feature learning, we evaluate all algorithms on the Omniglot dataset [22] following [10]. [...] In data reweighting, we evaluate all algorithms on the MNIST dataset [23] following [35].
Dataset Splits Yes We randomly select 100 classes and obtain a training, validation and testing set of size 500, 100, and 1000 respectively. [...] We randomly select 2000, 200, and 1000 images for training, validation and testing respectively.
Hardware Specification Yes Each experiment takes at most 10 hours in one Ge Force GTX 1080 Ti GPU.
Software Dependencies No The paper mentions using SGD but does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes We employ a mini-batch version of SGD in both levels of UD with a learning rate 0.1 and batch size 50. [...] We employ a mini-batch version of SGD in both levels of UD with a batch size 100. The learning rate is 10 in the outer level and 0.3 in the inner level.