Zero-Cost Proxies for Lightweight NAS

Authors: Mohamed S Abdelfattah, Abhinav Mehrotra, Łukasz Dudziak, Nicholas Donald Lane

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we evaluate conventional reduced-training proxies and quantify how well they preserve ranking between neural network models during search when compared with the rankings produced by final trained accuracy. We propose a series of zero-cost proxies... Our zero-cost proxies use 3 orders of magnitude less computation but can match and even outperform conventional proxies. 4 EMPIRICAL EVALUATION OF PROXY TASKS
Researcher Affiliation Collaboration Mohamed S. Abdelfattah1, Abhinav Mehrotra1, Łukasz Dudziak1, Nicholas D. Lane1,2 1 Samsung AI Center, Cambridge 2 University of Cambridge mohamed1.a@samsung.com
Pseudocode No The paper describes methods using prose and mathematical formulas, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is made public at: https://github.com/mohsaied/zero-cost-nas.
Open Datasets Yes NAS-Bench-201 CIFAR-10, NAS-Bench-201 on CIFAR-100, CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011), and ~200 models for Image Net1k (Deng et al., 2009). NAS-Bench-ASR... evaluated on the TIMIT dataset (Garofolo et al., 1993). NAS-Bench-101... with over 423k CNN models and training statistics on CIFAR-10 (Ying et al., 2019).
Dataset Splits Yes Figure 2: Correlation of validation accuracy to final test accuracy during the first 12 epochs of training for three datasets on the NAS-Bench-201 search space. The full configuration training of NAS-Bench-201 on CIFAR-10 uses input resolution r=32, number of channels in the stem convolution c=16 and number of epochs e=200
Hardware Specification Yes We used Nvidia Geforce GTX 1080 Ti and ran a random sample of 10 models for 10 epochs to get an average time-per-epoch for each proxy at different batch sizes.
Software Dependencies No The paper mentions software like 'Py Torch CV' and 'REINFORCE algorithm' but does not provide specific version numbers for any software dependencies (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes In Table 6 we list the hyper-parameters used in training the Eco NAS proxies to produce Figure 1. The only difference to the standard NAS-Bench-201 training pipeline (Dong & Yang, 2020) is our use of fewer epochs for the learning rate annealing schedule we anneal the learning rate to zero over 40 epochs instead of 200. This is a common technique used in speeding up convergence for training proxies Zhou et al. (2020). Table 6: Eco NAS training hyper-parameters for NAS-Bench-201. optimizer SGD initial LR 0.1 Nesterov final LR 0 momentum 0.9 LR schedule cosine weight decay 0.0005 epochs 40 random flip (p=0.5) batch size 256 random crop. For all NAS experiments, we repeat experiments 32 times and we plot the median and shade between the lower/upper quartiles.