Transfer NAS with Meta-learned Bayesian Surrogates
Authors: Gresa Shala, Thomas Elsken, Frank Hutter, Josif Grabocka
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a result, our method consistently achieves state-of-the-art results on six computer vision datasets, while being as fast as one-shot NAS methods. (Abstract); Our resulting method outperforms both state-of-the-art blackbox NAS methods as well as state-of-the-art one-shot methods across six computer vision benchmarks. (Introduction); Section 4 EXPERIMENTAL SETUP; Section 5 RESEARCH HYPOTHESES AND EXPERIMENTAL RESULTS |
| Researcher Affiliation | Collaboration | Gresa Shala1, Thomas Elsken2, Frank Hutter1,2 & Josif Grabocka1 1 Department of Computer Science, University of Freiburg {shalag,fh,grabocka}@cs.uni-freiburg.de 2Bosch Center for Artificial Intelligence thomas.elsken@de.bosch.com |
| Pseudocode | Yes | Algorithm 1 Meta-learning our deep-kernel GPs |
| Open Source Code | Yes | To foster reproducibility, we make our code available at https://github.com/TNAS-DCS/ TNAS-DCS. (Abstract); The code for our NAS method can be found in https://anonymous.4open.science/r/TNAS-DCS-CC08. (Appendix F, point 1d) |
| Open Datasets | Yes | We follow the experimental setup as Lee et al. (2021) for the NAS-Bench-201 (Dong & Yang, 2020) and Mobile Net V3 search spaces. On the NAS-Bench-201 search space, we also use the same meta datasets as Lee et al. (2021). It consists of 4230 meta-training datasets derived from Image Net. For the evaluation of our method and the baselines, we use six popular computer vision datasets: CIFAR-10, CIFAR-100, SVHN, Aircraft, Oxford IIT Pets, and MNIST. |
| Dataset Splits | Yes | Recall that we assume we are given a set of Q datasets, where on each dataset Dq we have Nq N+ evaluated architectures. We denote the n-th architecture evaluated on the q-th dataset as xq,n and its validation accuracy as yq,n. (Section 3.2); We tuned the dimensionality of the embedding of the dataset encoder and graph encoder... using the multi-fidelity Bayesian optimization method BOHB (Falkner et al., 2018) on the meta-training dataset (Section 3.1) |
| Hardware Specification | No | The paper mentions running experiments on "the same hardware" and refers to "GPU hours" but does not provide specific details about the GPU or CPU models, or any other hardware specifications. |
| Software Dependencies | No | The paper mentions using "GPytorch (Gardner et al., 2018) implementation" but does not provide a specific version number for GPytorch or any other software dependencies. |
| Experiment Setup | Yes | We tuned the dimensionality of the embedding of the dataset encoder and graph encoder (Embedding dims.), the architecture of the feed-forward neural network of our method (Num. layers, Num. units in layer 1, Num. units in layer 2, Num. units in layer 3, Num. units in layer 4), and the learning rate of the joint meta-training using BOHB (Falkner et al., 2018) on the meta-training dataset; please refer to Appendix A for details. (Section 3.1); Hyperparameter Range table (Appendix A); First, we evaluate the top-5 architectures from the meta-training set on the test dataset... In the NASBench201 search space, we repeat this BO loop for a total of 100 evaluations for CIFAR10 and CIFAR100, whereas for SVHN, Aircraft, Pets, and MNIST, we do 40 evaluations. In the Mobilenet V3 search space we do 50 evaluations for each of the datasets. (Appendix C) |