Zero-Cost Proxies for Lightweight NAS
Authors: Mohamed S Abdelfattah, Abhinav Mehrotra, Łukasz Dudziak, Nicholas Donald Lane
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we evaluate conventional reduced-training proxies and quantify how well they preserve ranking between neural network models during search when compared with the rankings produced by final trained accuracy. We propose a series of zero-cost proxies... Our zero-cost proxies use 3 orders of magnitude less computation but can match and even outperform conventional proxies. 4 EMPIRICAL EVALUATION OF PROXY TASKS |
| Researcher Affiliation | Collaboration | Mohamed S. Abdelfattah1, Abhinav Mehrotra1, Łukasz Dudziak1, Nicholas D. Lane1,2 1 Samsung AI Center, Cambridge 2 University of Cambridge mohamed1.a@samsung.com |
| Pseudocode | No | The paper describes methods using prose and mathematical formulas, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is made public at: https://github.com/mohsaied/zero-cost-nas. |
| Open Datasets | Yes | NAS-Bench-201 CIFAR-10, NAS-Bench-201 on CIFAR-100, CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011), and ~200 models for Image Net1k (Deng et al., 2009). NAS-Bench-ASR... evaluated on the TIMIT dataset (Garofolo et al., 1993). NAS-Bench-101... with over 423k CNN models and training statistics on CIFAR-10 (Ying et al., 2019). |
| Dataset Splits | Yes | Figure 2: Correlation of validation accuracy to final test accuracy during the first 12 epochs of training for three datasets on the NAS-Bench-201 search space. The full configuration training of NAS-Bench-201 on CIFAR-10 uses input resolution r=32, number of channels in the stem convolution c=16 and number of epochs e=200 |
| Hardware Specification | Yes | We used Nvidia Geforce GTX 1080 Ti and ran a random sample of 10 models for 10 epochs to get an average time-per-epoch for each proxy at different batch sizes. |
| Software Dependencies | No | The paper mentions software like 'Py Torch CV' and 'REINFORCE algorithm' but does not provide specific version numbers for any software dependencies (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | In Table 6 we list the hyper-parameters used in training the Eco NAS proxies to produce Figure 1. The only difference to the standard NAS-Bench-201 training pipeline (Dong & Yang, 2020) is our use of fewer epochs for the learning rate annealing schedule we anneal the learning rate to zero over 40 epochs instead of 200. This is a common technique used in speeding up convergence for training proxies Zhou et al. (2020). Table 6: Eco NAS training hyper-parameters for NAS-Bench-201. optimizer SGD initial LR 0.1 Nesterov final LR 0 momentum 0.9 LR schedule cosine weight decay 0.0005 epochs 40 random flip (p=0.5) batch size 256 random crop. For all NAS experiments, we repeat experiments 32 times and we plot the median and shade between the lower/upper quartiles. |