Neural Networks with Few Multiplications

Authors: Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results across 3 popular datasets (MNIST, CIFAR10, SVHN) show that this approach not only does not hurt classification performance but can result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardwarefriendly training of neural networks.
Researcher Affiliation Academia Zhouhan Lin Universit e de Montr eal Canada zhouhan.lin@umontreal.ca Matthieu Courbariaux Universit e de Montr eal Canada matthieu.courbariaux@gmail.com Roland Memisevic Universit e de Montr eal Canada roland.umontreal@gmail.com Yoshua Bengio Universit e de Montr eal Canada
Pseudocode Yes Algorithm 1 Quantized Back Propagation (QBP).
Open Source Code Yes The codes for these approaches are available online at https://github.com/hantek/ Binary Connect
Open Datasets Yes We experimented with 3 datasets: MNIST, CIFAR10, and SVHN.
Dataset Splits Yes The training set is separated into two parts, one of which is the training set with 40000 images and the other the validation set with 10000 images.
Hardware Specification No The paper mentions training on "GPU or CPU clusters" but does not provide specific hardware models (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies No Our implementation uses Theano (Bastien et al., 2012). This mentions software but does not provide a specific version number for Theano.
Experiment Setup Yes All models are trained with stochastic gradient descent (SGD) without momentum. We use batch normalization for all the models to accelerate learning. At training time, binary (ternary) connect and quantized back propagation are used, while at test time, we use the learned full resolution weights for the forward propagation. For each dataset, all hyper-parameters are set to the same values for the different methods, except that the learning rate is adapted independently for each one. The MNIST model uses a fully connected network with 4 layers: 784-1024-1024-1024-10. The training set is separated into two parts, one of which is the training set with 40000 images and the other the validation set with 10000 images. Training is conducted in a mini-batch way, with a batch size of 200.