NASI: Label- and Data-agnostic Neural Architecture Search at Initialization

Authors: Yao Shu, Shaofeng Cai, Zhongxiang Dai, Beng Chin Ooi, Bryan Kian Hsiang Low

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We will firstly empirically demonstrate the improved search efficiency and the competitive search effectiveness achieved by NASI in NAS-Bench-1Shot1 (Zela et al., 2020b) (Sec. 5.1). Compared with other NAS algorithms, NASI incurs the smallest search cost while preserving the competitive performance of its selected architectures. Meanwhile, the architectures selected by NASI from the DARTS (Liu et al., 2019) search space over CIFAR-10 consistently enjoy the competitive or even outperformed performance when evaluated on different benchmark datasets, e.g., CIFAR-10/100 and Image Net (Sec. 5.2). In Sec. 5.3, NASI is further demonstrated to be able to select well-performing architectures on CIFAR-10 even with randomly generated labels or data, which strongly supports the labeland data-agnostic search and therefore the guaranteed transferability achieved by our NASI.
Researcher Affiliation Academia Yao Shu, Shaofeng Cai, Zhongxiang Dai, Beng Chin Ooi & Bryan Kian Hsiang Low Department of Computer Science, National University of Singapore {shuyao,shaofeng,daizhongxiang,ooibc,lowkh}@comp.nus.edu.sg
Pseudocode Yes Algorithm 1 NAS at Initialization (NASI)
Open Source Code Yes Meanwhile, to guarantee the reproducibility of the empirical results in this paper, we have provided our codes in the supplementary materials and detailed experimental settings in Appendix B.
Open Datasets Yes We firstly validate the search efficiency and effectiveness of our NASI in the three search spaces of NAS-Bench-1Shot1 (Zela et al., 2020b) on CIFAR-10. Compared with other NAS algorithms, NASI incurs the smallest search cost while preserving the competitive performance of its selected architectures. Meanwhile, the architectures selected by NASI from the DARTS (Liu et al., 2019) search space over CIFAR-10 consistently enjoy the competitive or even outperformed performance when evaluated on different benchmark datasets, e.g., CIFAR-10/100 and Image Net (Sec. 5.2).
Dataset Splits No The paper mentions "training and validation loss" in its problem reformulation (Section 3.1) and refers to general training settings in Appendix B.4. However, it does not explicitly state the dataset split percentages or sample counts for validation, relying on standard practices for benchmark datasets like CIFAR-10/100 and ImageNet, but without providing the specific details within the paper.
Hardware Specification Yes The final selected architecture are then trained via stochastic gradient descent (SGD) of 600 epochs with a learning rate cosine scheduled from 0.025 to 0, momentum 0.9, weight decay 3 10 4 and batch size 96 on a single Nvidia 2080Ti GPU. Following P-DARTS (Chen et al., 2019) and SDARTS-ADV (Chen & Hsieh, 2020), we train the model from scratch for 250 epochs with a batch size of 1024 on 8 Nvidia 2080Ti GPUs.
Software Dependencies No The paper does not explicitly state software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). It refers to the use of a Neural Tangent Kernel and various deep learning techniques, implying reliance on common ML frameworks, but without specific versioning.
Experiment Setup Yes Following DARTS (Liu et al., 2019), the final selected architectures consist of 20 searched cells: 18 of them are identical normal cell and 2 of them are identical reduction cell. An auxiliary tower with weight 0.4 is located at 13-th cell of the final selected architectures and the number of initial channels is set to be 36. The final selected architecture are then trained via stochastic gradient descent (SGD) of 600 epochs with a learning rate cosine scheduled from 0.025 to 0, momentum 0.9, weight decay 3 10 4 and batch size 96 on a single Nvidia 2080Ti GPU. Cutout (Devries & Taylor, 2017), and Scheduled Drop Path linearly increased from 0 to 0.2 are also employed for regularization purpose.