Transfer NAS with Meta-learned Bayesian Surrogates

Authors: Gresa Shala, Thomas Elsken, Frank Hutter, Josif Grabocka

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a result, our method consistently achieves state-of-the-art results on six computer vision datasets, while being as fast as one-shot NAS methods. (Abstract); Our resulting method outperforms both state-of-the-art blackbox NAS methods as well as state-of-the-art one-shot methods across six computer vision benchmarks. (Introduction); Section 4 EXPERIMENTAL SETUP; Section 5 RESEARCH HYPOTHESES AND EXPERIMENTAL RESULTS
Researcher Affiliation Collaboration Gresa Shala1, Thomas Elsken2, Frank Hutter1,2 & Josif Grabocka1 1 Department of Computer Science, University of Freiburg {shalag,fh,grabocka}@cs.uni-freiburg.de 2Bosch Center for Artificial Intelligence thomas.elsken@de.bosch.com
Pseudocode Yes Algorithm 1 Meta-learning our deep-kernel GPs
Open Source Code Yes To foster reproducibility, we make our code available at https://github.com/TNAS-DCS/ TNAS-DCS. (Abstract); The code for our NAS method can be found in https://anonymous.4open.science/r/TNAS-DCS-CC08. (Appendix F, point 1d)
Open Datasets Yes We follow the experimental setup as Lee et al. (2021) for the NAS-Bench-201 (Dong & Yang, 2020) and Mobile Net V3 search spaces. On the NAS-Bench-201 search space, we also use the same meta datasets as Lee et al. (2021). It consists of 4230 meta-training datasets derived from Image Net. For the evaluation of our method and the baselines, we use six popular computer vision datasets: CIFAR-10, CIFAR-100, SVHN, Aircraft, Oxford IIT Pets, and MNIST.
Dataset Splits Yes Recall that we assume we are given a set of Q datasets, where on each dataset Dq we have Nq N+ evaluated architectures. We denote the n-th architecture evaluated on the q-th dataset as xq,n and its validation accuracy as yq,n. (Section 3.2); We tuned the dimensionality of the embedding of the dataset encoder and graph encoder... using the multi-fidelity Bayesian optimization method BOHB (Falkner et al., 2018) on the meta-training dataset (Section 3.1)
Hardware Specification No The paper mentions running experiments on "the same hardware" and refers to "GPU hours" but does not provide specific details about the GPU or CPU models, or any other hardware specifications.
Software Dependencies No The paper mentions using "GPytorch (Gardner et al., 2018) implementation" but does not provide a specific version number for GPytorch or any other software dependencies.
Experiment Setup Yes We tuned the dimensionality of the embedding of the dataset encoder and graph encoder (Embedding dims.), the architecture of the feed-forward neural network of our method (Num. layers, Num. units in layer 1, Num. units in layer 2, Num. units in layer 3, Num. units in layer 4), and the learning rate of the joint meta-training using BOHB (Falkner et al., 2018) on the meta-training dataset; please refer to Appendix A for details. (Section 3.1); Hyperparameter Range table (Appendix A); First, we evaluate the top-5 architectures from the meta-training set on the test dataset... In the NASBench201 search space, we repeat this BO loop for a total of 100 evaluations for CIFAR10 and CIFAR100, whereas for SVHN, Aircraft, Pets, and MNIST, we do 40 evaluations. In the Mobilenet V3 search space we do 50 evaluations for each of the datasets. (Appendix C)