Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stability and Generalization of Bilevel Programming in Hyperparameter Optimization
Authors: Fan Bao, Guoqiang Wu, Chongxuan LI, Jun Zhu, Bo Zhang
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments on feature learning and data reweighting for noisy labels, we corroborate our theoretical findings. |
| Researcher Affiliation | Academia | Fan Bao , Guoqiang Wu , Chongxuan Li , Jun Zhu , Bo Zhang Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua-Huawei Joint Center for AI BNRist Center, State Key Lab for Intell. Tech. & Sys., Tsinghua University, Beijing, China EMAIL,EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Unrolled differentiation for hyperparameter optimization; Algorithm 2 Cross-validation for hyperparameter optimization |
| Open Source Code | Yes | See https://github.com/baofff/stability_ho. |
| Open Datasets | Yes | In feature learning, we evaluate all algorithms on the Omniglot dataset [22] following [10]. [...] In data reweighting, we evaluate all algorithms on the MNIST dataset [23] following [35]. |
| Dataset Splits | Yes | We randomly select 100 classes and obtain a training, validation and testing set of size 500, 100, and 1000 respectively. [...] We randomly select 2000, 200, and 1000 images for training, validation and testing respectively. |
| Hardware Specification | Yes | Each experiment takes at most 10 hours in one Ge Force GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions using SGD but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We employ a mini-batch version of SGD in both levels of UD with a learning rate 0.1 and batch size 50. [...] We employ a mini-batch version of SGD in both levels of UD with a batch size 100. The learning rate is 10 in the outer level and 0.3 in the inner level. |