Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Neural Networks with Few Multiplications
Authors: Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio
ICLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results across 3 popular datasets (MNIST, CIFAR10, SVHN) show that this approach not only does not hurt classification performance but can result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardwarefriendly training of neural networks. |
| Researcher Affiliation | Academia | Zhouhan Lin Universit e de Montr eal Canada EMAIL Matthieu Courbariaux Universit e de Montr eal Canada EMAIL Roland Memisevic Universit e de Montr eal Canada EMAIL Yoshua Bengio Universit e de Montr eal Canada |
| Pseudocode | Yes | Algorithm 1 Quantized Back Propagation (QBP). |
| Open Source Code | Yes | The codes for these approaches are available online at https://github.com/hantek/ Binary Connect |
| Open Datasets | Yes | We experimented with 3 datasets: MNIST, CIFAR10, and SVHN. |
| Dataset Splits | Yes | The training set is separated into two parts, one of which is the training set with 40000 images and the other the validation set with 10000 images. |
| Hardware Specification | No | The paper mentions training on "GPU or CPU clusters" but does not provide specific hardware models (e.g., GPU/CPU models, memory) used for the experiments. |
| Software Dependencies | No | Our implementation uses Theano (Bastien et al., 2012). This mentions software but does not provide a specific version number for Theano. |
| Experiment Setup | Yes | All models are trained with stochastic gradient descent (SGD) without momentum. We use batch normalization for all the models to accelerate learning. At training time, binary (ternary) connect and quantized back propagation are used, while at test time, we use the learned full resolution weights for the forward propagation. For each dataset, all hyper-parameters are set to the same values for the different methods, except that the learning rate is adapted independently for each one. The MNIST model uses a fully connected network with 4 layers: 784-1024-1024-1024-10. The training set is separated into two parts, one of which is the training set with 40000 images and the other the validation set with 10000 images. Training is conducted in a mini-batch way, with a batch size of 200. |