BinaryConnect: Training Deep Neural Networks with binary weights during propagations
Authors: Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Binary Connect is a regularizer and we obtain near state-of-the-art results on the permutation-invariant MNIST, CIFAR-10 and SVHN (Section 3).Table 2: Test error rates of DNNs trained on the MNIST (no convolution and no unsupervised pretraining), CIFAR-10 (no data augmentation) and SVHN, depending on the method. |
| Researcher Affiliation | Academia | Matthieu Courbariaux Ecole Polytechnique de Montr eal matthieu.courbariaux@polymtl.ca Yoshua Bengio Universit e de Montr eal, CIFAR Senior Fellow yoshua.bengio@gmail.com Jean-Pierre David Ecole Polytechnique de Montr eal jean-pierre.david@polymtl.ca |
| Pseudocode | Yes | Algorithm 1 SGD training with Binary Connect. C is the cost function for minibatch and the functions binarize(w) and clip(w) specify how to binarize and clip weights. L is the number of layers. |
| Open Source Code | Yes | The main contributions of this article are the following. [...] We make the code for Binary Connect available 1. [Footnote 1: https://github.com/Matthieu Courbariaux/Binary Connect] |
| Open Datasets | Yes | We obtain near state-of-the-art results with Binary Connect on the permutation-invariant MNIST, CIFAR-10 and SVHN. MNIST is a benchmark image classification dataset [33]. CIFAR-10 is a benchmark image classification dataset. SVHN is a benchmark image classification dataset. |
| Dataset Splits | Yes | As typically done, we use the last 10000 samples of the training set as a validation set for early stopping and model selection. We use the last 5000 samples of the training set as a validation set. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were provided. The paper generally discusses GPUs and specialized hardware but not the specific setup for their experiments. |
| Software Dependencies | No | No specific ancillary software details including version numbers were provided. The paper mentions Theano, Pylearn2, and Lasagne, but without version numbers: 'We thank the developers of Theano [42, 43], a Python library which allowed us to easily develop a fast and optimized code for GPU. We also thank the developers of Pylearn2 [44] and Lasagne, two Deep Learning libraries built on the top of Theano.' |
| Experiment Setup | Yes | The MLP we train on MNIST consists in 3 hidden layers of 1024 Rectifier Linear Units (Re LU) [34, 24, 3] and a L2-SVM output layer (L2-SVM has been shown to perform better than Softmax on several classification benchmarks [30, 32]). The square hinge loss is minimized with SGD without momentum. We use an exponentially decaying learning rate. We use Batch Normalization with a minibatch of size 200 to speed up the training.The architecture of our CNN is: (2 128C3) MP2 (2 256C3) MP2 (2 512C3) MP2 (2 1024FC) 10SV M (5) Where C3 is a 3 3 Re LU convolution layer, MP2 is a 2 2 max-pooling layer, FC a fully connected layer, and SVM a L2-SVM output layer. This architecture is greatly inspired from VGG [36]. The square hinge loss is minimized with ADAM. We use an exponentially decaying learning rate. We use Batch Normalization with a minibatch of size 50 to speed up the training. We report the test error rate associated with the best validation error rate after 500 training epochs (we do not retrain on the validation set).we use half the number of hidden units and we train for 200 epochs instead of 500 (because SVHN is quite a big dataset). |