Neural Architecture Search with Bayesian Optimisation and Optimal Transport

Authors: Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, Eric P. Xing

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks.
Researcher Affiliation Collaboration Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabás Póczos, Eric P Xing Carnegie Mellon University, Petuum Inc. {kandasamy, willie, schneide, bapoczos, epxing}@cs.cmu.edu
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code Yes Our python implementations of OTMANN and NASBOT are available at github.com/kirthevasank/nasbot.
Open Datasets Yes We use the following datasets: blog feedback [4], indoor location [46], slice localisation [11], naval propulsion [5], protein tertiary structure [34], news popularity [7], Cifar10 [24].
Dataset Splits Yes For the first 6 datasets, we use a 0.6 0.2 0.2 train-validation-test split and normalised the input and output to have zero mean and unit variance. For Cifar10 we use 40K for training and 10K each for validation and testing.
Hardware Specification Yes For the blog, indoor, slice, naval and protein datasets we use 2 Ge Force GTX 970 (4GB) GPUs and a computational budget of 8 hours for each method. For the news popularity dataset we use 4 Ge Force GTX 980 (6GB) GPUs with a budget of 6 hours and for Cifar10 we use 4 K80 (12GB) GPUs with a budget of 10 hours.
Software Dependencies No The paper mentions 'Our python implementations' but does not specify Python version or any other software dependencies with version numbers.
Experiment Setup Yes For the regression datasets, we train each model with stochastic gradient descent (SGD) with a fixed step size of 10 5, a batch size of 256 for 20K batch iterations. For Cifar10, we start with a step size of 10 2, and reduce it gradually. We train in batches of 32 images for 60K batch iterations.