Rethinking Bi-Level Optimization in Neural Architecture Search: A Gibbs Sampling Perspective
Authors: Chao Xue, Xiaoxing Wang, Junchi Yan, Yonggang Hu, Xiaokang Yang, Kewei Sun10551-10559
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed NAS method Gibbs NAS on the search space used in DARTS/ENAS as well as the search space of NAS-Bench-201. Experimental results on multiple search space show the efficacy and stability of our approach. Experiments Gibbs NAS is evaluated in three settings: 1) the micro cell based search space used in ENAS (Pham et al. 2018) and DARTS (Liu, Simonyan, and Yang 2019); 2) the search space derived from NAS-Bench-201 (Dong and Yang 2020); 3) transferable performance of Image Net (Russakovsky et al. 2015) classification from the basic cell searched on CIFAR-10 (Krizhevsky, Hinton et al. 2009a). |
| Researcher Affiliation | Collaboration | Chao Xue1, Xiaoxing Wang2, Junchi Yan2 , Yonggang Hu3, Xiaokang Yang2, Kewei Sun1 1 IBM Research China 2 Shanghai Jiao Tong University 3 IBM System {xuechao,sunkewei}@cn.ibm.com, {figure1 wxx,yanjunchi,xkyang}@sjtu.edu.cn, yhu@ca.ibm.com |
| Pseudocode | Yes | Algorithm 1 Gibbs NAS: Uncertainty-Aware One-Shot Neural Architecture Search by Gibbs Sampling |
| Open Source Code | No | The paper does not provide any explicit statement or link for the open-sourcing of the described methodology's code. |
| Open Datasets | Yes | Experimental results on multiple search space show the efficacy and stability of our approach. ... Gibbs NAS is evaluated in three settings: 1) the micro cell based search space used in ENAS (Pham et al. 2018) and DARTS (Liu, Simonyan, and Yang 2019); 2) the search space derived from NAS-Bench-201 (Dong and Yang 2020); 3) transferable performance of Image Net (Russakovsky et al. 2015) classification from the basic cell searched on CIFAR-10 (Krizhevsky, Hinton et al. 2009a). |
| Dataset Splits | Yes | We follow the setting of DARTS to update ω and α by train set and validation set, respectively. For fairness, we follow the training settings and split protocol as the original paper (Dong and Yang 2020). |
| Hardware Specification | Yes | In the first setting and the third setting, experiments are run on one Tesla V100, while in the second setting experiments are performed on one Tesla K80. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or library versions). |
| Experiment Setup | Yes | For the network weights ω updates (Eq. 15), we set initial learning rate to 0.1 with batch size 64. For the architecture importance α updates (Eq. 16), we set initial learning rate to 0.1 with batch size 2048. ... we set both weight decay (σ 2 ω ) and importance decay (σ 2 α ) to 1e-4. In the first setting and the third setting, experiments are run on one Tesla V100, while in the second setting experiments are performed on one Tesla K80. |