BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search
Authors: Colin White, Willie Neiswanger, Yash Savani10293-10301
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test several different methods for each component and also develop a novel path-based encoding scheme for neural architectures, which we show theoretically and empirically scales better than other encodings. Using all of our analyses, we develop a final algorithm called BANANAS, which achieves state-of-the-art performance on NAS search spaces. |
| Researcher Affiliation | Collaboration | Colin White,1 Willie Neiswanger, 23 Yash Savani 1 1 Abacus.AI 2 Stanford University 3 Petuum, Inc. |
| Pseudocode | Yes | See Algorithm 1 and Figure 4. Algorithm 1 BANANAS |
| Open Source Code | Yes | We adhere to the NAS research checklist (Lindauer and Hutter 2019) to facilitate best practices, and our code is available at https://github.com/naszilla/naszilla.1 |
| Open Datasets | Yes | We run experiments on the NASBench-101 dataset... The NASBench-101 dataset (Ying et al. 2019)... We use the open source version of the NASBench-101 dataset (Ying et al. 2019). The NASBench-201 dataset (Yang, Esperanc a, and Carlucci 2020)... |
| Dataset Splits | No | The NASBench-101 dataset... each architecture comes with precomputed validation and test accuracies on CIFAR-10. We compare the different neural predictors by training them on a set of neural architectures drawn i.i.d. from NASBench-101, along with validation accuracies, and then computing the MAE on a held-out test set of size 1000. The paper mentions architectures come with 'precomputed validation and test accuracies', which are inherent to the NASBench datasets. For their neural predictor, they specify a 'held-out test set of size 1000' but do not provide the exact size or percentage for the training or a dedicated validation set for the predictor's training process. |
| Hardware Specification | Yes | Each algorithm is given a budget of 47 TPU hours, or about 150 neural architecture evaluations on NASBench-101... The runtime unit is total GPU-days on a Tesla V100. |
| Software Dependencies | No | The paper mentions software components like 'sequential fully-connected network', 'Adam optimizer', 'GCN', 'VAE', and 'BOHAMIANN implementation', and implies usage of PyTorch through a citation, but it does not specify any version numbers for these software dependencies (e.g., 'PyTorch 1.9', 'Adam optimizer vX.Y'). |
| Experiment Setup | Yes | The feedforward neural network we use is a sequential fully-connected network with 10 layers of width 20, the Adam optimizer with a learning rate of 0.01, and the loss function set to mean absolute error (MAE)... Each algorithm is given a budget of 47 TPU hours, or about 150 neural architecture evaluations on NASBench-101... The algorithms output 10 architectures in each iteration of BO for better parallelization... For the loss function in the neural predictors, we use mean absolute percentage error (MAPE)... In each evaluation, the chosen architecture is trained for 50 epochs and the average validation error of the last 5 epochs is recorded. |