BayesNAS: A Bayesian Approach for Neural Architecture Search

Authors: Hongpeng Zhou, Minghao Yang, Jun Wang, Wei Pan

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments focus on two scenarios in NAS: proxy NAS and proxyless NAS.Table 1. Classification errors of Bayes NAS and state-of-the-art image classifiers on CIFAR-10.Table 2. Comparison with state-of-the-art image classifiers on Image Net in the mobile setting.
Researcher Affiliation Academia 1 Department of Cognitive Robotics, Delft University of Technology, Netherlands 2 Department of Computer Science, University College London, UK. Correspondence to: Wei Pan <wei.pan@tudelft.nl>.
Pseudocode Yes The pseudo code is summarized in Algorithm 1.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets Yes Impressively, this enabled us to find the architecture on CIFAR-10 within only 0.2 GPU days using a single GPU. Competitive performance can be also achieved by transferring to Image Net.
Dataset Splits Yes The validation accuracy is presented in Table 1.
Hardware Specification Yes All the experiments were performed using NVIDIA TITAN V GPUs
Software Dependencies No The paper mentions software components like 'SGD optimizer' and refers to 'deep learning open source software' but does not provide specific version numbers for any libraries or frameworks used.
Experiment Setup Yes Since we cache the feature maps in memory, we can only set batch size as 18. The optimizer we use is SGD optimizer with momentum 0.9 and fixed learning rate 0.1.In the searching stage, we set batch size to 32 and learning rate to 0.1.We use the same optimizer as for proxy search.After the architecture is determined, the network is trained from scratch with the batch size of 64, learning rate as 0.1 and cosine annealing learning rate decay schedule (Loshchilov & Hutter, 2017).A network of 14 cells is trained for 250 epochs with batch size 128, weight decay 3 10 5 and initial SGD learning rate 0.1 (decayed by a factor of 0.97 after each epoch).