Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neural Architecture Search with Bayesian Optimisation and Optimal Transport
Authors: Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, Eric P. Xing
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks. |
| Researcher Affiliation | Collaboration | Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabás Póczos, Eric P Xing Carnegie Mellon University, Petuum Inc. EMAIL |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Our python implementations of OTMANN and NASBOT are available at github.com/kirthevasank/nasbot. |
| Open Datasets | Yes | We use the following datasets: blog feedback [4], indoor location [46], slice localisation [11], naval propulsion [5], protein tertiary structure [34], news popularity [7], Cifar10 [24]. |
| Dataset Splits | Yes | For the first 6 datasets, we use a 0.6 0.2 0.2 train-validation-test split and normalised the input and output to have zero mean and unit variance. For Cifar10 we use 40K for training and 10K each for validation and testing. |
| Hardware Specification | Yes | For the blog, indoor, slice, naval and protein datasets we use 2 Ge Force GTX 970 (4GB) GPUs and a computational budget of 8 hours for each method. For the news popularity dataset we use 4 Ge Force GTX 980 (6GB) GPUs with a budget of 6 hours and for Cifar10 we use 4 K80 (12GB) GPUs with a budget of 10 hours. |
| Software Dependencies | No | The paper mentions 'Our python implementations' but does not specify Python version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | For the regression datasets, we train each model with stochastic gradient descent (SGD) with a fixed step size of 10 5, a batch size of 256 for 20K batch iterations. For Cifar10, we start with a step size of 10 2, and reduce it gradually. We train in batches of 32 images for 60K batch iterations. |