reproducibilityindex.ai

Neural Networks with Few Multiplications

Authors: Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across 3 popular datasets (MNIST, CIFAR10, SVHN) show that this approach not only does not hurt classiﬁcation performance but can result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardwarefriendly training of neural networks.
Researcher Affiliation	Academia	Zhouhan Lin Universit e de Montr eal Canada zhouhan.lin@umontreal.ca Matthieu Courbariaux Universit e de Montr eal Canada matthieu.courbariaux@gmail.com Roland Memisevic Universit e de Montr eal Canada roland.umontreal@gmail.com Yoshua Bengio Universit e de Montr eal Canada
Pseudocode	Yes	Algorithm 1 Quantized Back Propagation (QBP).
Open Source Code	Yes	The codes for these approaches are available online at https://github.com/hantek/ Binary Connect
Open Datasets	Yes	We experimented with 3 datasets: MNIST, CIFAR10, and SVHN.
Dataset Splits	Yes	The training set is separated into two parts, one of which is the training set with 40000 images and the other the validation set with 10000 images.
Hardware Specification	No	The paper mentions training on "GPU or CPU clusters" but does not provide specific hardware models (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies	No	Our implementation uses Theano (Bastien et al., 2012). This mentions software but does not provide a specific version number for Theano.
Experiment Setup	Yes	All models are trained with stochastic gradient descent (SGD) without momentum. We use batch normalization for all the models to accelerate learning. At training time, binary (ternary) connect and quantized back propagation are used, while at test time, we use the learned full resolution weights for the forward propagation. For each dataset, all hyper-parameters are set to the same values for the different methods, except that the learning rate is adapted independently for each one. The MNIST model uses a fully connected network with 4 layers: 784-1024-1024-1024-10. The training set is separated into two parts, one of which is the training set with 40000 images and the other the validation set with 10000 images. Training is conducted in a mini-batch way, with a batch size of 200.