Rethinking Bi-Level Optimization in Neural Architecture Search: A Gibbs Sampling Perspective

Authors: Chao Xue, Xiaoxing Wang, Junchi Yan, Yonggang Hu, Xiaokang Yang, Kewei Sun10551-10559

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed NAS method Gibbs NAS on the search space used in DARTS/ENAS as well as the search space of NAS-Bench-201. Experimental results on multiple search space show the efficacy and stability of our approach. Experiments Gibbs NAS is evaluated in three settings: 1) the micro cell based search space used in ENAS (Pham et al. 2018) and DARTS (Liu, Simonyan, and Yang 2019); 2) the search space derived from NAS-Bench-201 (Dong and Yang 2020); 3) transferable performance of Image Net (Russakovsky et al. 2015) classification from the basic cell searched on CIFAR-10 (Krizhevsky, Hinton et al. 2009a).
Researcher Affiliation Collaboration Chao Xue1, Xiaoxing Wang2, Junchi Yan2 , Yonggang Hu3, Xiaokang Yang2, Kewei Sun1 1 IBM Research China 2 Shanghai Jiao Tong University 3 IBM System {xuechao,sunkewei}@cn.ibm.com, {figure1 wxx,yanjunchi,xkyang}@sjtu.edu.cn, yhu@ca.ibm.com
Pseudocode Yes Algorithm 1 Gibbs NAS: Uncertainty-Aware One-Shot Neural Architecture Search by Gibbs Sampling
Open Source Code No The paper does not provide any explicit statement or link for the open-sourcing of the described methodology's code.
Open Datasets Yes Experimental results on multiple search space show the efficacy and stability of our approach. ... Gibbs NAS is evaluated in three settings: 1) the micro cell based search space used in ENAS (Pham et al. 2018) and DARTS (Liu, Simonyan, and Yang 2019); 2) the search space derived from NAS-Bench-201 (Dong and Yang 2020); 3) transferable performance of Image Net (Russakovsky et al. 2015) classification from the basic cell searched on CIFAR-10 (Krizhevsky, Hinton et al. 2009a).
Dataset Splits Yes We follow the setting of DARTS to update ω and α by train set and validation set, respectively. For fairness, we follow the training settings and split protocol as the original paper (Dong and Yang 2020).
Hardware Specification Yes In the first setting and the third setting, experiments are run on one Tesla V100, while in the second setting experiments are performed on one Tesla K80.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or library versions).
Experiment Setup Yes For the network weights ω updates (Eq. 15), we set initial learning rate to 0.1 with batch size 64. For the architecture importance α updates (Eq. 16), we set initial learning rate to 0.1 with batch size 2048. ... we set both weight decay (σ 2 ω ) and importance decay (σ 2 α ) to 1e-4. In the first setting and the third setting, experiments are run on one Tesla V100, while in the second setting experiments are performed on one Tesla K80.