reproducibilityindex.ai

GENNAPE: Towards Generalized Neural Architecture Performance Estimators

Authors: Keith G. Mills, Fred X. Han, Jialin Zhang, Fabian Chudak, Ali Safari Mamaghani, Mohammad Salameh, Wei Lu, Shangling Jui, Di Niu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that GENNAPE pretrained on NAS-Bench-101 can achieve superior transferability to 5 different public neural network benchmarks, including NAS-Bench-201, NAS-Bench-301, Mobile Net and Res Net families under no or minimum fine-tuning. We further introduce 3 challenging newly labelled neural network benchmarks: Hi AML, Inception and Two-Path, which can concentrate in narrow accuracy ranges. Extensive experiments show that GENNAPE can correctly discern high-performance architectures in these families. Finally, when paired with a search algorithm, GENNAPE can find architectures that improve accuracy while reducing FLOPs on three families.
Researcher Affiliation	Collaboration	Keith G. Mills1,2*, Fred X. Han2, Jialin Zhang3, Fabian Chudak2, Ali Safari Mamaghani1, Mohammad Salameh2, Wei Lu2, Shangling Jui3, Di Niu1 1Department of Electrical and Computer Engineering, University of Alberta 2Huawei Technologies, Edmonton, Alberta, Canada 3Huawei Kirin Solution, Shanghai, China {kgmills, safarima, dniu}@ualberta.ca {fred.xuefei.han1, fabian.chudak, mohammad.salameh, jui.shangling}@huawei.com {zhangjialin10, robin.luwei}@hisilicon.com
Pseudocode	No	The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Finally, we open-source1 these new benchmarks to facilitate further research on generalizable neural predictors. 1https://github.com/Ascend-Research/GENNAPE
Open Datasets	Yes	NAS-Bench-101 (NB-101) is one of the first and largest benchmarks for NAS. It consists of 423k unique architectures, individually evaluated on CIFAR-10. The architectures are cell-based, where each cell is a Directed Acyclic Graph (DAG) containing operations, stacked repeatedly to form a network. We sample 50k random architectures from this family to form our CG training family. NAS-Bench-201 (NB-201) and NAS-Bench-301 (NB301) are two additional benchmarks. Like NB-101, architectures consist of a fixed topology of cells, except they follow the DARTS search space. Additionally, NB-201 only contains 15.6k evaluated architectures, while NB-301 is a surrogate predictor for the DARTS search space. Therefore, we treat both as test families. Proxyless NAS (PN) (Cai, Zhu, and Han 2019) and Oncefor-All-Mobile Net V3 (OFA-MBv3) (Cai et al. 2020) are based on the Mobile Net (Howard et al. 2019) architecture families, with PN and OFA-MBv3 implementing versions 2 and 3, respectively. Once-for-All-Res Net (OFA-RN) is based on the classical Res Net-50 (He et al. 2016) topology. All three evaluate on Image Net. Architectures consist of searchable, macro features where the number of blocks is variable. We refer the reader to Mills et al. (2021b) for further details regarding PN, OFA-MBv3 and OFA-RN.
Dataset Splits	Yes	We split the NB-101 architectures into a training set with 80% of the data, and two separate validations sets that contain 10% each.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It mentions running on "Ascend Tiny Core" for some applications but not for the experiments described.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We train a CL encoder on NB-101 and use it to infer embeddings for all test families. The model trains for 40 epochs. We evaluate it on the first validation set at every epoch without applying the inverse of Equation 5 to track loss statistics on transformed labels. Once training is complete, we further evaluate the model on the second validation set, this time applying the inverse of Equation 5 on predictions to calculate performance compared to the ground truth accuracy. For each scheme, we train a model 5 times using different random seeds. The overall best model is the one that achieves the highest rank correlation on the second NB101 validation set. We provide hyperparameters and other training details in the supplementary materials.