K-shot NAS: Learnable Weight-Sharing for NAS with K-shot Supernets

Authors: Xiu Su, Shan You, Mingkai Zheng, Fei Wang, Chen Qian, Changshui Zhang, Chang Xu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets validate that K-shot NAS significantly improves the evaluation accuracy of paths and thus brings in impressive performance improvements.
Researcher Affiliation Collaboration 1School of Computer Science, Faculty of Engineering, The University of Sydney, Australia 2Sense Time Research 3Department of Automation, Tsinghua University, Institute for Artificial Intelligence, Tsinghua University (THUAI), Beijing National Research Center for Information Science and Technology (BNRist).
Pseudocode Yes Algorithm 1 Training and search with K-shot supernets
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository for the described methodology.
Open Datasets Yes Dataset. We perform the architecture search on the large-scale dataset Image Net (ILSVRC-12) (Russakovsky et al., 2015), which contains 1.28M training images from 1000 categories. Specifically, following (Guo et al., 2020b), we randomly sample 50K images from the training set as the local validation set, with the rest images used for training. Finally, we report the accuracy of our searched architecture on the test dataset (which is the public validation set of the original ILSVRC2012 Image Net dataset)." and "NAS-Bench-201 (Dong & Yang, 2020) is a NAS benchmark that contains 15625 architectures and provides the train-from-scratch performances of these architectures evaluated on Image Net-16120, CIFAR-100, and CIFAR-10.
Dataset Splits Yes Specifically, following (Guo et al., 2020b), we randomly sample 50K images from the training set as the local validation set, with the rest images used for training.
Hardware Specification Yes All experiments are implemented with Py Torch (Paszke et al., 2019) and trained on 8 NVIDIA Tesla V100 GPUs.
Software Dependencies No All experiments are implemented with Py Torch (Paszke et al., 2019)..." (PyTorch version is not specified, nor are other software dependencies with versions).
Experiment Setup Yes We adopt bs and τ as 16 and 0.3 for Eq.(9) and Eq.(15), respectively. For training K-shot supernets, we follow the same training recipe as (You et al., 2020; Guo et al., 2020b) for a fair comparison. With a batch size of 1024, the supernets are trained using a SGD optimizer with 0.9 momentum and Nesterov acceleration. The learning rate is initialized as 0.12 and decay with cosine annealing for 120 epochs.