How to Train a Compact Binary Neural Network with High Accuracy?

Authors: Wei Tang, Gang Hua, Liang Wang

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our findings first reveal that a low learning rate is highly preferred to avoid frequent sign changes of the weights, which often makes the learning of Binary Nets unstable... The composition of all these enables us to train Binary Nets with both high compression rate and high accuracy, which is strongly supported by our extensive empirical study. Table 1: Comparison of different methods on Image Net dataset.
Researcher Affiliation Collaboration Wei Tang,1 Gang Hua,4 Liang Wang1,2,3 1Institute of Automation, Chinese Academy of Sciences (CASIA) 2 Center for Excellence in Brain Science and Intelligence Technology, CAS 3University of Chinese Academy of Sciences 4Microsoft Research, Beijing, China
Pseudocode Yes Algorithm 1 Training a L layers Binary Net.
Open Source Code No The paper does not contain any explicit statement about making the source code available or a link to a code repository for the methodology described.
Open Datasets Yes The image classification task on the large-scale Image Net dataset... Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. 2015. Imagenet large scale visual recognition challenge.
Dataset Splits Yes We hold out part of training images for hyper-parameter tuning and the final model is evaluated on the validation dataset with only single center crop.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running experiments.
Software Dependencies No We implement our work on Caffe (Jia et al. 2014).
Experiment Setup Yes For the hyper-parameters, unless otherwise specified, the initial learning rate is set to 0.0001 and divided by 2 once the training loss stops decreasing. The parameter λ is set to 5 10 7 and the batch size is set to 256. Just as previous works did, batch normalization layer is used before each binary convolution layer and ADAM is used as the solver.