Learning Discrete Weights Using the Local Reparameterization Trick

Authors: Oran Shayer, Dan Levi, Ethan Fetaya

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using the proposed training we test both binary and ternary models on MNIST, CIFAR-10 and Image Net benchmarks and reach state-of-the-art results on most experiments.
Researcher Affiliation Collaboration Oran Shayer General Motors Advanced Technical Center Israel Department of Electrical Engineering, Technion oran.sh@gmail.com Dan Levi General Motors Advanced Technical Center Israel dan.levi@gm.com Ethan Fetaya University of Toronto Vector Institute ethanf@cs.toronto.edu
Pseudocode Yes Algorithm 1 Discrete layer forward pass
Open Source Code No The information is insufficient. The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes We conducted extensive experiments on the MNIST, CIFAR-10 and Image Net (ILSVRC2012) benchmarks. MNIST is an image classification benchmark dataset, containing 60K training images and 10K test images from 10 classes of digits 0 9. CIFAR-10 is an image classification benchmark dataset (Krizhevsky, 2009), containing 50K training images and 10K test images from 10 classes. Image Net 2012 (ILSVRC2012) is a large scale image classification dataset (Deng et al., 2009), consisting of 1.28 million training images, and 50K validation images from 1000 classes.
Dataset Splits Yes MNIST is an image classification benchmark dataset, containing 60K training images and 10K test images from 10 classes of digits 0 9. CIFAR-10 is an image classification benchmark dataset (Krizhevsky, 2009), containing 50K training images and 10K test images from 10 classes. Image Net 2012 (ILSVRC2012) is a large scale image classification dataset (Deng et al., 2009), consisting of 1.28 million training images, and 50K validation images from 1000 classes.
Hardware Specification No The information is insufficient. The paper does not specify any particular GPU or CPU models, or other specific hardware used for experiments.
Software Dependencies No The information is insufficient. The paper mentions using Adam and Batch Normalization, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We use a batch size of 256, initial learning rate is 0.01 and is divided by 10 after 100 epochs of training. For the binary setting, beta parameter is set to 1e 6. For the ternary setting, probability decay parameter is set to 1e 11. We report the test error rate after 190 training epochs. (MNIST details) Weight decay parameter is set to 1e 5, we use a batch size of 256 and initial learning rate is 0.01. For the binary setting, we found that the beta regularizer is not needed and got the best results when beta parameter is set to 0. Learning rate is divided by 10 after 50 and 60 epochs and we report the test error rate after 65 training epochs. For the ternary setting, probability decay parameter is set to 1e 12. For this setting, learning rate is divided by 10 after 30 and 44 epochs and we report the test error rate after 55 training epochs. (ImageNet details)