BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search

Authors: Colin White, Willie Neiswanger, Yash Savani10293-10301

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test several different methods for each component and also develop a novel path-based encoding scheme for neural architectures, which we show theoretically and empirically scales better than other encodings. Using all of our analyses, we develop a final algorithm called BANANAS, which achieves state-of-the-art performance on NAS search spaces.
Researcher Affiliation Collaboration Colin White,1 Willie Neiswanger, 23 Yash Savani 1 1 Abacus.AI 2 Stanford University 3 Petuum, Inc.
Pseudocode Yes See Algorithm 1 and Figure 4. Algorithm 1 BANANAS
Open Source Code Yes We adhere to the NAS research checklist (Lindauer and Hutter 2019) to facilitate best practices, and our code is available at https://github.com/naszilla/naszilla.1
Open Datasets Yes We run experiments on the NASBench-101 dataset... The NASBench-101 dataset (Ying et al. 2019)... We use the open source version of the NASBench-101 dataset (Ying et al. 2019). The NASBench-201 dataset (Yang, Esperanc a, and Carlucci 2020)...
Dataset Splits No The NASBench-101 dataset... each architecture comes with precomputed validation and test accuracies on CIFAR-10. We compare the different neural predictors by training them on a set of neural architectures drawn i.i.d. from NASBench-101, along with validation accuracies, and then computing the MAE on a held-out test set of size 1000. The paper mentions architectures come with 'precomputed validation and test accuracies', which are inherent to the NASBench datasets. For their neural predictor, they specify a 'held-out test set of size 1000' but do not provide the exact size or percentage for the training or a dedicated validation set for the predictor's training process.
Hardware Specification Yes Each algorithm is given a budget of 47 TPU hours, or about 150 neural architecture evaluations on NASBench-101... The runtime unit is total GPU-days on a Tesla V100.
Software Dependencies No The paper mentions software components like 'sequential fully-connected network', 'Adam optimizer', 'GCN', 'VAE', and 'BOHAMIANN implementation', and implies usage of PyTorch through a citation, but it does not specify any version numbers for these software dependencies (e.g., 'PyTorch 1.9', 'Adam optimizer vX.Y').
Experiment Setup Yes The feedforward neural network we use is a sequential fully-connected network with 10 layers of width 20, the Adam optimizer with a learning rate of 0.01, and the loss function set to mean absolute error (MAE)... Each algorithm is given a budget of 47 TPU hours, or about 150 neural architecture evaluations on NASBench-101... The algorithms output 10 architectures in each iteration of BO for better parallelization... For the loss function in the neural predictors, we use mean absolute percentage error (MAPE)... In each evaluation, the chosen architecture is trained for 50 epochs and the average validation error of the last 5 epochs is recorded.