BayesNAS: A Bayesian Approach for Neural Architecture Search
Authors: Hongpeng Zhou, Minghao Yang, Jun Wang, Wei Pan
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments focus on two scenarios in NAS: proxy NAS and proxyless NAS.Table 1. Classification errors of Bayes NAS and state-of-the-art image classifiers on CIFAR-10.Table 2. Comparison with state-of-the-art image classifiers on Image Net in the mobile setting. |
| Researcher Affiliation | Academia | 1 Department of Cognitive Robotics, Delft University of Technology, Netherlands 2 Department of Computer Science, University College London, UK. Correspondence to: Wei Pan <wei.pan@tudelft.nl>. |
| Pseudocode | Yes | The pseudo code is summarized in Algorithm 1. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Impressively, this enabled us to find the architecture on CIFAR-10 within only 0.2 GPU days using a single GPU. Competitive performance can be also achieved by transferring to Image Net. |
| Dataset Splits | Yes | The validation accuracy is presented in Table 1. |
| Hardware Specification | Yes | All the experiments were performed using NVIDIA TITAN V GPUs |
| Software Dependencies | No | The paper mentions software components like 'SGD optimizer' and refers to 'deep learning open source software' but does not provide specific version numbers for any libraries or frameworks used. |
| Experiment Setup | Yes | Since we cache the feature maps in memory, we can only set batch size as 18. The optimizer we use is SGD optimizer with momentum 0.9 and fixed learning rate 0.1.In the searching stage, we set batch size to 32 and learning rate to 0.1.We use the same optimizer as for proxy search.After the architecture is determined, the network is trained from scratch with the batch size of 64, learning rate as 0.1 and cosine annealing learning rate decay schedule (Loshchilov & Hutter, 2017).A network of 14 cells is trained for 250 epochs with batch size 128, weight decay 3 10 5 and initial SGD learning rate 0.1 (decayed by a factor of 0.97 after each epoch). |