reproducibilityindex.ai

Rethinking Bi-Level Optimization in Neural Architecture Search: A Gibbs Sampling Perspective

Authors: Chao Xue, Xiaoxing Wang, Junchi Yan, Yonggang Hu, Xiaokang Yang, Kewei Sun10551-10559

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our proposed NAS method Gibbs NAS on the search space used in DARTS/ENAS as well as the search space of NAS-Bench-201. Experimental results on multiple search space show the efﬁcacy and stability of our approach. Experiments Gibbs NAS is evaluated in three settings: 1) the micro cell based search space used in ENAS (Pham et al. 2018) and DARTS (Liu, Simonyan, and Yang 2019); 2) the search space derived from NAS-Bench-201 (Dong and Yang 2020); 3) transferable performance of Image Net (Russakovsky et al. 2015) classiﬁcation from the basic cell searched on CIFAR-10 (Krizhevsky, Hinton et al. 2009a).
Researcher Affiliation	Collaboration	Chao Xue1, Xiaoxing Wang2, Junchi Yan2 , Yonggang Hu3, Xiaokang Yang2, Kewei Sun1 1 IBM Research China 2 Shanghai Jiao Tong University 3 IBM System {xuechao,sunkewei}@cn.ibm.com, {ﬁgure1 wxx,yanjunchi,xkyang}@sjtu.edu.cn, yhu@ca.ibm.com
Pseudocode	Yes	Algorithm 1 Gibbs NAS: Uncertainty-Aware One-Shot Neural Architecture Search by Gibbs Sampling
Open Source Code	No	The paper does not provide any explicit statement or link for the open-sourcing of the described methodology's code.
Open Datasets	Yes	Experimental results on multiple search space show the efﬁcacy and stability of our approach. ... Gibbs NAS is evaluated in three settings: 1) the micro cell based search space used in ENAS (Pham et al. 2018) and DARTS (Liu, Simonyan, and Yang 2019); 2) the search space derived from NAS-Bench-201 (Dong and Yang 2020); 3) transferable performance of Image Net (Russakovsky et al. 2015) classiﬁcation from the basic cell searched on CIFAR-10 (Krizhevsky, Hinton et al. 2009a).
Dataset Splits	Yes	We follow the setting of DARTS to update ω and α by train set and validation set, respectively. For fairness, we follow the training settings and split protocol as the original paper (Dong and Yang 2020).
Hardware Specification	Yes	In the ﬁrst setting and the third setting, experiments are run on one Tesla V100, while in the second setting experiments are performed on one Tesla K80.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or library versions).
Experiment Setup	Yes	For the network weights ω updates (Eq. 15), we set initial learning rate to 0.1 with batch size 64. For the architecture importance α updates (Eq. 16), we set initial learning rate to 0.1 with batch size 2048. ... we set both weight decay (σ 2 ω ) and importance decay (σ 2 α ) to 1e-4. In the ﬁrst setting and the third setting, experiments are run on one Tesla V100, while in the second setting experiments are performed on one Tesla K80.