MergeNAS: Merge Operations into One for Differentiable Architecture Search
Authors: Xiaoxing Wang, Chao Xue, Junchi Yan, Xiaokang Yang, Yonggang Hu, Kewei Sun
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on different search space and various datasets have been conducted to verify our approach, showing that Merge NAS can converge to a stable architecture and achieve better performance with fewer parameters and search cost. |
| Researcher Affiliation | Collaboration | Xiaoxing Wang1 , Chao Xue3 , Junchi Yan 2,1 , Xiaokang Yang1 , Yonggang Hu4 and Kewei Sun3 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2Department of Computer Science and Engineering, Shanghai Jiao Tong University 3IBM Research China 4IBM System |
| Pseudocode | Yes | Algorithm 1 Merge NAS: Weight Merge for NAS |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | NAS-Bench-201 provides the detailed information of all possible architectures belonging to the specific search space, including the accuracy and latency on three datasets: CIFAR10 [Krizhevsky et al., 2009], CIFAR100 [Krizhevsky et al., 2009], and Image Net-16-120 [Dong and Yang, 2020]. |
| Dataset Splits | Yes | NAS-Bench-201 has evaluated validation and test accuracy of all the possible architectures in the search space, so we can focus on the searching process and directly index for the information of the architecture obtained by our approach. ... We follow the settings in NAS-Bench-201 [Dong and Yang, 2020]. |
| Hardware Specification | Yes | The search cost only includes the time of the searching process on NVIDIA 1080 Ti. |
| Software Dependencies | No | The paper does not provide specific software details with version numbers (e.g., library names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Training Settings. We set the training settings similar as DARTS. In search phase, we train the one-shot model stacked by 8 cells for 50 epochs, and optimize the architecture parameters and the network weights based on the two-step iteration of DARTS [Liu et al., 2019]. SGD optimizer with momentum 0.9 and initial learning rate 0.025 is used to optimize the network weights. Adam optimizer with initial learning rate 10 4 is used to optimize the architecture parameters. ... In the evaluation phase, a large network stacked by 20 cells...is trained from scratch with batch size 96. ...We follow the settings in NAS-Bench-201 [Dong and Yang, 2020]. In search phase, we train the one-shot model stacked by 17 cells (including 5 normal cells in each resolution) for 50 epochs and only search the structure of the normal cell. We optimize the network weights by SGD optimizer with momentum 0.9, and the architecture parameters by Adam optimizer whose β equals to (0.5, 0.999). Inspired by the recent work [Zela et al., 2020], we increase the weight decay up to 2e 2 for both DARTS and Merge NAS. |