Neural Architecture Search with Bayesian Optimisation and Optimal Transport
Authors: Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, Eric P. Xing
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks. |
| Researcher Affiliation | Collaboration | Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabás Póczos, Eric P Xing Carnegie Mellon University, Petuum Inc. {kandasamy, willie, schneide, bapoczos, epxing}@cs.cmu.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Our python implementations of OTMANN and NASBOT are available at github.com/kirthevasank/nasbot. |
| Open Datasets | Yes | We use the following datasets: blog feedback [4], indoor location [46], slice localisation [11], naval propulsion [5], protein tertiary structure [34], news popularity [7], Cifar10 [24]. |
| Dataset Splits | Yes | For the first 6 datasets, we use a 0.6 0.2 0.2 train-validation-test split and normalised the input and output to have zero mean and unit variance. For Cifar10 we use 40K for training and 10K each for validation and testing. |
| Hardware Specification | Yes | For the blog, indoor, slice, naval and protein datasets we use 2 Ge Force GTX 970 (4GB) GPUs and a computational budget of 8 hours for each method. For the news popularity dataset we use 4 Ge Force GTX 980 (6GB) GPUs with a budget of 6 hours and for Cifar10 we use 4 K80 (12GB) GPUs with a budget of 10 hours. |
| Software Dependencies | No | The paper mentions 'Our python implementations' but does not specify Python version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | For the regression datasets, we train each model with stochastic gradient descent (SGD) with a fixed step size of 10 5, a batch size of 256 for 20K batch iterations. For Cifar10, we start with a step size of 10 2, and reduce it gradually. We train in batches of 32 images for 60K batch iterations. |