Rethinking Architecture Selection in Differentiable NAS

Authors: Ruochen Wang, Minhao Cheng, Xiangning Chen, Xiaocheng Tang, Cho-Jui Hsieh

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical and theoretical analysis to show that the magnitude of architecture parameters does not necessarily indicate how much the operation contributes to the supernet s performance. We re-evaluate several differentiable NAS methods with the proposed architecture selection and find that it is able to extract significantly improved architectures from the underlying supernets consistently.
Researcher Affiliation Collaboration 1Department of Computer Science, UCLA, 2Di Di AI Labs {ruocwang, mhcheng}@ucla.edu {xiangning, chohsieh}@cs.ucla.edu xiaochengtang@didiglobal.com
Pseudocode Yes Algorithm 1: Perturbation-based Architecture Selection; Algorithm 2: Perturbation-based Architecture Selection
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of its methodology.
Open Datasets Yes The evaluation is based on the search space of DARTS and NAS-Bench-201 (Dong & Yang, 2020), and we show that the perturbation-based architecture selection method can be applied to several variants of DARTS. Every architecture in the search space is trained under the same protocol on three datasets (cifar10, cifar100, and imagenet16-120), and their performance can be obtained by querying the database.
Dataset Splits No The paper mentions 'validation accuracy' frequently but does not explicitly provide specific numerical splits (e.g., percentages or counts) for training, validation, and test sets. It implies standard splits are used for benchmark datasets but does not detail them.
Hardware Specification Yes Recorded on a single GTX 1080Ti GPU.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers.
Experiment Setup Yes We keep all the search and retrain settings identical to DARTS since our method only modifies the architecture selection part. After the search phase, we perform perturbation-based architecture selection following Algorithm 1 on the pretrained supernet. We tune the supernet for 5 epochs between two selections as it is enough for the supernet to recover from the drop of accuracy after discretization.