How Powerful are Performance Predictors in Neural Architecture Search?

Authors: Colin White, Arber Zela, Robin Ru, Yang Liu, Frank Hutter

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study 31 predictors across four popular search spaces and four datasets: NAS-Bench-201 [13] with CIFAR-10, CIFAR-100, and Image Net16-120, NAS-Bench-101 [71] and DARTS [32] with CIFAR-10, and NAS-Bench-NLP [21] with Penn Tree Bank. In order to give a fair comparison among different classes of predictors, we run a full portfolio of experiments, measuring the Pearson correlation and rank correlation metrics (Spearman, Kendall Tau, and sparse Kendall Tau), across a variety of initialization time and query time budgets.
Researcher Affiliation Collaboration Colin White1 , Arber Zela2, Binxin Ru3, Yang Liu1, Frank Hutter2,4 1 Abacus.AI, 2 University of Freiburg, 3 University of Oxford, 4 Bosch Center for Artificial Intelligence
Pseudocode Yes We give results in Figure 3 and pseudo-code as well as additional experiments in Section B.4.
Open Source Code Yes Our code, featuring a library of 31 performance predictors, is available at https://github.com/automl/naslib.
Open Datasets Yes NAS-Bench-101 [71] consists of over 423 000 unique neural architectures with precomputed training, validation, and test accuracies after training for 4, 12, 36, and 108 epochs on CIFAR-10 [71].
Dataset Splits Yes NAS-Bench-101 [71] consists of over 423 000 unique neural architectures with precomputed training, validation, and test accuracies after training for 4, 12, 36, and 108 epochs on CIFAR-10 [71]. ... NAS-Bench-201 [13] consists of 15 625 architectures ... Each architecture has full learning curve information for training, validation, and test losses/accuracies for 200 epochs on CIFAR-10 [22], CIFAR-100, and Image Net-16-120 [10].
Hardware Specification Yes On NAS-Bench-201 CIFAR-10, the 11 initialization time budgets are spaced logarithmically from 1 second to 1.8 107 seconds on a 1080 Ti GPU (which corresponds to training 1000 random architectures on average)
Software Dependencies No The paper mentions its code is based on the NASLib library, but it does not provide specific version numbers for NASLib or any other software dependencies (e.g., Python, PyTorch, etc.) in the main text.
Experiment Setup Yes Hyperparameter tuning. Although we used the code directly from the original repositories (sometimes making changes when necessary to adapt to NAS-Bench search spaces), the predictors had significantly different levels of hyperparameter tuning. ... For each search space, we run random search on each performance predictor for 5000 iterations, with a maximum total runtime of 15 minutes. ... The hyperparameter value ranges for each predictor can be found in Section B.2.