NAS evaluation is frustratingly hard

Authors: Antoine Yang, Pedro M. Esperança, Fabio M. Carlucci

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our first contribution is a benchmark of 8 NAS methods on 5 datasets. To overcome the hurdle of comparing methods with different search spaces, we propose using a method s relative improvement over the randomly sampled average architecture, which effectively removes advantages arising from expertly engineered search spaces or training protocols. Surprisingly, we find that many NAS techniques struggle to significantly beat the average architecture baseline. We perform further experiments with the commonly used DARTS search space in order to understand the contribution of each component in the NAS pipeline.
Researcher Affiliation Collaboration Antoine Yang Ecole Polytechnique France Pedro M Esperanc a Huawei Noah s Ark Lab London, UK Fabio Maria Carlucci Huawei Noah s Ark Lab London, UK This work was done when the first author was an intern at Huawei Noah s Ark Lab, London, United Kingdom.
Pseudocode No The paper describes methods and protocols in prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code used is available at https://github.com/antoyang/NAS-Benchmark.
Open Datasets Yes The datasets used are CIFAR10, CIFAR100, SPORT8, MIT67 and FLOWERS102. The CIFAR10 dataset (Krizhevsky, 2009)... The CIFAR100 dataset (Krizhevsky, 2009)... SPORT8... (Li & Fei-Fei, 2007). MIT67... (Quattoni & Torralba, 2009). FLOWERS102... (Nilsback & Zisserman, 2008).
Dataset Splits Yes Each of these datasets is split into a training, validation and testing subsets of size 25, 000, 25, 000 and 10, 000 respectively. (for CIFAR10/100) Each of these datasets is split into a training, validation and testing subsets with proportions 40/40/20 (%). (for SPORT8, MIT67, FLOWERS102)
Hardware Specification Yes All experiments were run on NVIDIA Tesla V100 GPUs.
Software Dependencies Yes It notably updates the official implementation to a pytorch version posterior to 0.4.
Experiment Setup Yes Table 2: Hyperparameters for different NAS methods. Appendix A.1 METHODS AND HYPERPARAMETERS provides detailed settings for optimizers, learning rates, epochs, batch sizes, and additional enhancements like Cutout, Drop Path, Auxiliary Towers.