PreNAS: Preferred One-Shot Learning Towards Efficient Neural Architecture Search

Authors: Haibin Wang, Ce Ge, Hesen Chen, Xiuyu Sun

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have demonstrated that Pre NAS consistently outperforms state-ofthe-art one-shot NAS competitors for both Vision Transformer and convolutional architectures, and importantly, enables instant specialization with zero search cost.
Researcher Affiliation Industry 1Alibaba Group, Beijing, China.
Pseudocode Yes Algorithm 1 Greedy allocation of heads for isomer architectures
Open Source Code Yes Our code is available at https://github.com/tinyvision/Pre NAS.
Open Datasets Yes Comparison of different Vision Transformers on Image Net.", "we present results on CIFAR-10/100 (Krizhevsky & Hinton, 2009), Flowers-102 (Nilsback & Zisserman, 2008), Stanford Cars (Krause et al., 2013), Oxford-IIIT Pets (Parkhi et al., 2012), and i Naturalist 2019 (Horn et al., 2018).
Dataset Splits No The paper mentions evaluating performance on a 'validation dataset' (Eq. 2) but does not provide specific split percentages, sample counts, or details on how the dataset was partitioned for training, validation, and testing.
Hardware Specification Yes We conducted experiments and measured design time on NVIDIA A100 GPUs.
Software Dependencies No We implemented Pre NAS upon the PyTorch (Paszke et al., 2019) framework with improvements from the timm (Wightman, 2019) library. Specific version numbers for PyTorch and timm are not provided.
Experiment Setup Yes The input images are all resized to 224x244 and split into patches of size 16x16. We use the AdamW optimizer with a mini-batch size of 1024. The learning rate is initially set to 1e-3 and decays to 2e-5 through a cosine scheduler in 500 epoches. The discretization margin ε is set to 1M. The detailed hyper-parameter settings are presented in Tab. 9.