Rethinking Architecture Selection in Differentiable NAS
Authors: Ruochen Wang, Minhao Cheng, Xiangning Chen, Xiaocheng Tang, Cho-Jui Hsieh
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical and theoretical analysis to show that the magnitude of architecture parameters does not necessarily indicate how much the operation contributes to the supernet s performance. We re-evaluate several differentiable NAS methods with the proposed architecture selection and find that it is able to extract significantly improved architectures from the underlying supernets consistently. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, UCLA, 2Di Di AI Labs {ruocwang, mhcheng}@ucla.edu {xiangning, chohsieh}@cs.ucla.edu xiaochengtang@didiglobal.com |
| Pseudocode | Yes | Algorithm 1: Perturbation-based Architecture Selection; Algorithm 2: Perturbation-based Architecture Selection |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of its methodology. |
| Open Datasets | Yes | The evaluation is based on the search space of DARTS and NAS-Bench-201 (Dong & Yang, 2020), and we show that the perturbation-based architecture selection method can be applied to several variants of DARTS. Every architecture in the search space is trained under the same protocol on three datasets (cifar10, cifar100, and imagenet16-120), and their performance can be obtained by querying the database. |
| Dataset Splits | No | The paper mentions 'validation accuracy' frequently but does not explicitly provide specific numerical splits (e.g., percentages or counts) for training, validation, and test sets. It implies standard splits are used for benchmark datasets but does not detail them. |
| Hardware Specification | Yes | Recorded on a single GTX 1080Ti GPU. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers. |
| Experiment Setup | Yes | We keep all the search and retrain settings identical to DARTS since our method only modifies the architecture selection part. After the search phase, we perform perturbation-based architecture selection following Algorithm 1 on the pretrained supernet. We tune the supernet for 5 epochs between two selections as it is enough for the supernet to recover from the drop of accuracy after discretization. |