Learning Discrete Weights Using the Local Reparameterization Trick
Authors: Oran Shayer, Dan Levi, Ethan Fetaya
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using the proposed training we test both binary and ternary models on MNIST, CIFAR-10 and Image Net benchmarks and reach state-of-the-art results on most experiments. |
| Researcher Affiliation | Collaboration | Oran Shayer General Motors Advanced Technical Center Israel Department of Electrical Engineering, Technion oran.sh@gmail.com Dan Levi General Motors Advanced Technical Center Israel dan.levi@gm.com Ethan Fetaya University of Toronto Vector Institute ethanf@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1 Discrete layer forward pass |
| Open Source Code | No | The information is insufficient. The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We conducted extensive experiments on the MNIST, CIFAR-10 and Image Net (ILSVRC2012) benchmarks. MNIST is an image classification benchmark dataset, containing 60K training images and 10K test images from 10 classes of digits 0 9. CIFAR-10 is an image classification benchmark dataset (Krizhevsky, 2009), containing 50K training images and 10K test images from 10 classes. Image Net 2012 (ILSVRC2012) is a large scale image classification dataset (Deng et al., 2009), consisting of 1.28 million training images, and 50K validation images from 1000 classes. |
| Dataset Splits | Yes | MNIST is an image classification benchmark dataset, containing 60K training images and 10K test images from 10 classes of digits 0 9. CIFAR-10 is an image classification benchmark dataset (Krizhevsky, 2009), containing 50K training images and 10K test images from 10 classes. Image Net 2012 (ILSVRC2012) is a large scale image classification dataset (Deng et al., 2009), consisting of 1.28 million training images, and 50K validation images from 1000 classes. |
| Hardware Specification | No | The information is insufficient. The paper does not specify any particular GPU or CPU models, or other specific hardware used for experiments. |
| Software Dependencies | No | The information is insufficient. The paper mentions using Adam and Batch Normalization, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We use a batch size of 256, initial learning rate is 0.01 and is divided by 10 after 100 epochs of training. For the binary setting, beta parameter is set to 1e 6. For the ternary setting, probability decay parameter is set to 1e 11. We report the test error rate after 190 training epochs. (MNIST details) Weight decay parameter is set to 1e 5, we use a batch size of 256 and initial learning rate is 0.01. For the binary setting, we found that the beta regularizer is not needed and got the best results when beta parameter is set to 0. Learning rate is divided by 10 after 50 and 60 epochs and we report the test error rate after 65 training epochs. For the ternary setting, probability decay parameter is set to 1e 12. For this setting, learning rate is divided by 10 after 30 and 44 epochs and we report the test error rate after 55 training epochs. (ImageNet details) |