NASI: Label- and Data-agnostic Neural Architecture Search at Initialization
Authors: Yao Shu, Shaofeng Cai, Zhongxiang Dai, Beng Chin Ooi, Bryan Kian Hsiang Low
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We will firstly empirically demonstrate the improved search efficiency and the competitive search effectiveness achieved by NASI in NAS-Bench-1Shot1 (Zela et al., 2020b) (Sec. 5.1). Compared with other NAS algorithms, NASI incurs the smallest search cost while preserving the competitive performance of its selected architectures. Meanwhile, the architectures selected by NASI from the DARTS (Liu et al., 2019) search space over CIFAR-10 consistently enjoy the competitive or even outperformed performance when evaluated on different benchmark datasets, e.g., CIFAR-10/100 and Image Net (Sec. 5.2). In Sec. 5.3, NASI is further demonstrated to be able to select well-performing architectures on CIFAR-10 even with randomly generated labels or data, which strongly supports the labeland data-agnostic search and therefore the guaranteed transferability achieved by our NASI. |
| Researcher Affiliation | Academia | Yao Shu, Shaofeng Cai, Zhongxiang Dai, Beng Chin Ooi & Bryan Kian Hsiang Low Department of Computer Science, National University of Singapore {shuyao,shaofeng,daizhongxiang,ooibc,lowkh}@comp.nus.edu.sg |
| Pseudocode | Yes | Algorithm 1 NAS at Initialization (NASI) |
| Open Source Code | Yes | Meanwhile, to guarantee the reproducibility of the empirical results in this paper, we have provided our codes in the supplementary materials and detailed experimental settings in Appendix B. |
| Open Datasets | Yes | We firstly validate the search efficiency and effectiveness of our NASI in the three search spaces of NAS-Bench-1Shot1 (Zela et al., 2020b) on CIFAR-10. Compared with other NAS algorithms, NASI incurs the smallest search cost while preserving the competitive performance of its selected architectures. Meanwhile, the architectures selected by NASI from the DARTS (Liu et al., 2019) search space over CIFAR-10 consistently enjoy the competitive or even outperformed performance when evaluated on different benchmark datasets, e.g., CIFAR-10/100 and Image Net (Sec. 5.2). |
| Dataset Splits | No | The paper mentions "training and validation loss" in its problem reformulation (Section 3.1) and refers to general training settings in Appendix B.4. However, it does not explicitly state the dataset split percentages or sample counts for validation, relying on standard practices for benchmark datasets like CIFAR-10/100 and ImageNet, but without providing the specific details within the paper. |
| Hardware Specification | Yes | The final selected architecture are then trained via stochastic gradient descent (SGD) of 600 epochs with a learning rate cosine scheduled from 0.025 to 0, momentum 0.9, weight decay 3 10 4 and batch size 96 on a single Nvidia 2080Ti GPU. Following P-DARTS (Chen et al., 2019) and SDARTS-ADV (Chen & Hsieh, 2020), we train the model from scratch for 250 epochs with a batch size of 1024 on 8 Nvidia 2080Ti GPUs. |
| Software Dependencies | No | The paper does not explicitly state software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). It refers to the use of a Neural Tangent Kernel and various deep learning techniques, implying reliance on common ML frameworks, but without specific versioning. |
| Experiment Setup | Yes | Following DARTS (Liu et al., 2019), the final selected architectures consist of 20 searched cells: 18 of them are identical normal cell and 2 of them are identical reduction cell. An auxiliary tower with weight 0.4 is located at 13-th cell of the final selected architectures and the number of initial channels is set to be 36. The final selected architecture are then trained via stochastic gradient descent (SGD) of 600 epochs with a learning rate cosine scheduled from 0.025 to 0, momentum 0.9, weight decay 3 10 4 and batch size 96 on a single Nvidia 2080Ti GPU. Cutout (Devries & Taylor, 2017), and Scheduled Drop Path linearly increased from 0 to 0.2 are also employed for regularization purpose. |