reproducibilityindex.ai

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

Authors: Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that Binary Connect is a regularizer and we obtain near state-of-the-art results on the permutation-invariant MNIST, CIFAR-10 and SVHN (Section 3).Table 2: Test error rates of DNNs trained on the MNIST (no convolution and no unsupervised pretraining), CIFAR-10 (no data augmentation) and SVHN, depending on the method.
Researcher Affiliation	Academia	Matthieu Courbariaux Ecole Polytechnique de Montr eal matthieu.courbariaux@polymtl.ca Yoshua Bengio Universit e de Montr eal, CIFAR Senior Fellow yoshua.bengio@gmail.com Jean-Pierre David Ecole Polytechnique de Montr eal jean-pierre.david@polymtl.ca
Pseudocode	Yes	Algorithm 1 SGD training with Binary Connect. C is the cost function for minibatch and the functions binarize(w) and clip(w) specify how to binarize and clip weights. L is the number of layers.
Open Source Code	Yes	The main contributions of this article are the following. [...] We make the code for Binary Connect available 1. [Footnote 1: https://github.com/Matthieu Courbariaux/Binary Connect]
Open Datasets	Yes	We obtain near state-of-the-art results with Binary Connect on the permutation-invariant MNIST, CIFAR-10 and SVHN. MNIST is a benchmark image classiﬁcation dataset [33]. CIFAR-10 is a benchmark image classiﬁcation dataset. SVHN is a benchmark image classiﬁcation dataset.
Dataset Splits	Yes	As typically done, we use the last 10000 samples of the training set as a validation set for early stopping and model selection. We use the last 5000 samples of the training set as a validation set.
Hardware Specification	No	No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were provided. The paper generally discusses GPUs and specialized hardware but not the specific setup for their experiments.
Software Dependencies	No	No specific ancillary software details including version numbers were provided. The paper mentions Theano, Pylearn2, and Lasagne, but without version numbers: 'We thank the developers of Theano [42, 43], a Python library which allowed us to easily develop a fast and optimized code for GPU. We also thank the developers of Pylearn2 [44] and Lasagne, two Deep Learning libraries built on the top of Theano.'
Experiment Setup	Yes	The MLP we train on MNIST consists in 3 hidden layers of 1024 Rectiﬁer Linear Units (Re LU) [34, 24, 3] and a L2-SVM output layer (L2-SVM has been shown to perform better than Softmax on several classiﬁcation benchmarks [30, 32]). The square hinge loss is minimized with SGD without momentum. We use an exponentially decaying learning rate. We use Batch Normalization with a minibatch of size 200 to speed up the training.The architecture of our CNN is: (2 128C3) MP2 (2 256C3) MP2 (2 512C3) MP2 (2 1024FC) 10SV M (5) Where C3 is a 3 3 Re LU convolution layer, MP2 is a 2 2 max-pooling layer, FC a fully connected layer, and SVM a L2-SVM output layer. This architecture is greatly inspired from VGG [36]. The square hinge loss is minimized with ADAM. We use an exponentially decaying learning rate. We use Batch Normalization with a minibatch of size 50 to speed up the training. We report the test error rate associated with the best validation error rate after 500 training epochs (we do not retrain on the validation set).we use half the number of hidden units and we train for 200 epochs instead of 500 (because SVHN is quite a big dataset).