reproducibilityindex.ai

NASI: Label- and Data-agnostic Neural Architecture Search at Initialization

Authors: Yao Shu, Shaofeng Cai, Zhongxiang Dai, Beng Chin Ooi, Bryan Kian Hsiang Low

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We will firstly empirically demonstrate the improved search efﬁciency and the competitive search effectiveness achieved by NASI in NAS-Bench-1Shot1 (Zela et al., 2020b) (Sec. 5.1). Compared with other NAS algorithms, NASI incurs the smallest search cost while preserving the competitive performance of its selected architectures. Meanwhile, the architectures selected by NASI from the DARTS (Liu et al., 2019) search space over CIFAR-10 consistently enjoy the competitive or even outperformed performance when evaluated on different benchmark datasets, e.g., CIFAR-10/100 and Image Net (Sec. 5.2). In Sec. 5.3, NASI is further demonstrated to be able to select well-performing architectures on CIFAR-10 even with randomly generated labels or data, which strongly supports the labeland data-agnostic search and therefore the guaranteed transferability achieved by our NASI.
Researcher Affiliation	Academia	Yao Shu, Shaofeng Cai, Zhongxiang Dai, Beng Chin Ooi & Bryan Kian Hsiang Low Department of Computer Science, National University of Singapore {shuyao,shaofeng,daizhongxiang,ooibc,lowkh}@comp.nus.edu.sg
Pseudocode	Yes	Algorithm 1 NAS at Initialization (NASI)
Open Source Code	Yes	Meanwhile, to guarantee the reproducibility of the empirical results in this paper, we have provided our codes in the supplementary materials and detailed experimental settings in Appendix B.
Open Datasets	Yes	We ﬁrstly validate the search efﬁciency and effectiveness of our NASI in the three search spaces of NAS-Bench-1Shot1 (Zela et al., 2020b) on CIFAR-10. Compared with other NAS algorithms, NASI incurs the smallest search cost while preserving the competitive performance of its selected architectures. Meanwhile, the architectures selected by NASI from the DARTS (Liu et al., 2019) search space over CIFAR-10 consistently enjoy the competitive or even outperformed performance when evaluated on different benchmark datasets, e.g., CIFAR-10/100 and Image Net (Sec. 5.2).
Dataset Splits	No	The paper mentions "training and validation loss" in its problem reformulation (Section 3.1) and refers to general training settings in Appendix B.4. However, it does not explicitly state the dataset split percentages or sample counts for validation, relying on standard practices for benchmark datasets like CIFAR-10/100 and ImageNet, but without providing the specific details within the paper.
Hardware Specification	Yes	The ﬁnal selected architecture are then trained via stochastic gradient descent (SGD) of 600 epochs with a learning rate cosine scheduled from 0.025 to 0, momentum 0.9, weight decay 3 10 4 and batch size 96 on a single Nvidia 2080Ti GPU. Following P-DARTS (Chen et al., 2019) and SDARTS-ADV (Chen & Hsieh, 2020), we train the model from scratch for 250 epochs with a batch size of 1024 on 8 Nvidia 2080Ti GPUs.
Software Dependencies	No	The paper does not explicitly state software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). It refers to the use of a Neural Tangent Kernel and various deep learning techniques, implying reliance on common ML frameworks, but without specific versioning.
Experiment Setup	Yes	Following DARTS (Liu et al., 2019), the ﬁnal selected architectures consist of 20 searched cells: 18 of them are identical normal cell and 2 of them are identical reduction cell. An auxiliary tower with weight 0.4 is located at 13-th cell of the ﬁnal selected architectures and the number of initial channels is set to be 36. The ﬁnal selected architecture are then trained via stochastic gradient descent (SGD) of 600 epochs with a learning rate cosine scheduled from 0.025 to 0, momentum 0.9, weight decay 3 10 4 and batch size 96 on a single Nvidia 2080Ti GPU. Cutout (Devries & Taylor, 2017), and Scheduled Drop Path linearly increased from 0 to 0.2 are also employed for regularization purpose.