reproducibilityindex.ai

Binarized Neural Architecture Search

Authors: Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, David Doermann, Rongrong Ji10526-10533

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that the proposed BNAS achieves a performance comparable to NAS on both CIFAR and Image Net databases.
Researcher Affiliation	Academia	1Beihang University, 2Xiamen University, 3University at Buffalo 4Shenzhen Institutes of Advanced Technology, University of Chinese Academy of Sciences
Pseudocode	Yes	Algorithm 1: Performance-Based Search
Open Source Code	No	Both methods are easily implemented in our BNAS framework, and the source code will be publicly available soon.
Open Datasets	Yes	First, most experiments are conducted on CIFAR-10 (Krizhevsky, Hinton, and others 2009)... Then we further perform experiments to search architectures directly on Image Net... evaluate the classiﬁcation accuracy on ILSVRC 2012 Image Net (Russakovsky et al. 2015)
Dataset Splits	Yes	During architecture search, the 50K training samples of CIFAR-10 is divided into two subsets of equal size, one for training the network weights and the other for ﬁnding the architecture hyper-parameters. When reducing the search space, we randomly select 5K images from the training set as a validation set (used in line 8 of Algorithm 1).
Hardware Specification	Yes	Our BNAS is 40% faster (tested on our platform (NVIDIA GTX TITAN Xp)).
Software Dependencies	No	All the experiments and models are implemented in Py Torch (Paszke et al. 2017).
Experiment Setup	Yes	We set the hyper-parameter C in PC-DARTS to 2 for CIFAR-10... The batch size is set to 128 during the search of an architecture for L = 5 epochs... We also set T = 3 (line 4 in Algorithm 1) and V = 1 (line 14), so the network is trained less than 60 epochs, with a larger batch size of 400... The initial number of channels is 16. We use SGD with momentum to optimize the network weights, with an initial learning rate of 0.025 (annealed down to zero following a cosine schedule), a momentum of 0.9, and a weight decay of 5 10 4. The learning rate for ﬁnding the hyper-parameters is set to 0.01.