MOTE-NAS: Multi-Objective Training-based Estimate for Efficient Neural Architecture Search

Authors: Yuming Zhang, Jun Hsieh, Xin Li, Ming-Ching Chang, Chun-Chieh Lee, Kuo-Chin Fan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on NASBench-201 show MOTE-NAS achieves 94.32% accuracy on CIFAR-10, 72.81% on CIFAR-100, and 46.38% on Image Net-16-120, outperforming NTKbased NAS approaches. An evaluation-free (EF) version of MOTE-NAS delivers high efficiency in only 5 minutes, delivering a model more accurate than KNAS.
Researcher Affiliation Academia Yu-Ming Zhang1 Jun-Wei Hsieh2 Xin Li3 Ming-Ching Chang3 Chun-Chieh Lee1 Kuo-Chin Fan1 1National Central University 3University at Albany 2National Yang Ming Chiao Tung University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No We have described the experimental details in the paper and supplementary materials as much as possible, and we will make the code public on github.
Open Datasets Yes We used NASBench-101 and NASBench-201, both cell-based search spaces. NASBench-101 has 423,621 candidates trained on CIFAR-10 for 108 epochs. NASBench-201 includes 15,625 candidates trained on CIFAR-10, CIFAR-100, and Image Net-16-120 for 200 epochs each. We search for a promising architecture based on the mobilenet V3 search space using MOTE, then train and evaluate it on imagenet-1K.
Dataset Splits No The paper mentions 'training data' and 'test accuracy' (including for early stopping), but does not explicitly provide percentages, counts, or a detailed methodology for train/validation/test dataset splits used in their experiments.
Hardware Specification Yes Computation was on Tesla V100 GPUs, with MOTE or MOTE-NAS costs calculated specifically on V100. Following 200 epochs of training using 10 GTX 2080Ti GPUs on the imagenet-1K dataset
Software Dependencies No The paper mentions the use of 'Adam optimizer' and 'box-cox transformation', but does not provide specific version numbers for software libraries or dependencies used in the experiments.
Experiment Setup Yes The hyperparameters are batch size 256, epochs 50, learning rate 0.001 with Adam optimizer, and cross-entropy loss function. Then select the best architecture by the early stopping version of the test accuracy.