Binarized Neural Architecture Search

Authors: Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, David Doermann, Rongrong Ji10526-10533

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the proposed BNAS achieves a performance comparable to NAS on both CIFAR and Image Net databases.
Researcher Affiliation Academia 1Beihang University, 2Xiamen University, 3University at Buffalo 4Shenzhen Institutes of Advanced Technology, University of Chinese Academy of Sciences
Pseudocode Yes Algorithm 1: Performance-Based Search
Open Source Code No Both methods are easily implemented in our BNAS framework, and the source code will be publicly available soon.
Open Datasets Yes First, most experiments are conducted on CIFAR-10 (Krizhevsky, Hinton, and others 2009)... Then we further perform experiments to search architectures directly on Image Net... evaluate the classification accuracy on ILSVRC 2012 Image Net (Russakovsky et al. 2015)
Dataset Splits Yes During architecture search, the 50K training samples of CIFAR-10 is divided into two subsets of equal size, one for training the network weights and the other for finding the architecture hyper-parameters. When reducing the search space, we randomly select 5K images from the training set as a validation set (used in line 8 of Algorithm 1).
Hardware Specification Yes Our BNAS is 40% faster (tested on our platform (NVIDIA GTX TITAN Xp)).
Software Dependencies No All the experiments and models are implemented in Py Torch (Paszke et al. 2017).
Experiment Setup Yes We set the hyper-parameter C in PC-DARTS to 2 for CIFAR-10... The batch size is set to 128 during the search of an architecture for L = 5 epochs... We also set T = 3 (line 4 in Algorithm 1) and V = 1 (line 14), so the network is trained less than 60 epochs, with a larger batch size of 400... The initial number of channels is 16. We use SGD with momentum to optimize the network weights, with an initial learning rate of 0.025 (annealed down to zero following a cosine schedule), a momentum of 0.9, and a weight decay of 5 10 4. The learning rate for finding the hyper-parameters is set to 0.01.