Loss-aware Weight Quantization of Deep Networks
Authors: Lu Hou, James T. Kwok
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on feedforward and recurrent neural networks show that the proposed scheme outperforms state-of-the-art weight quantization algorithms, and is as accurate (or even more accurate) than the full-precision network. |
| Researcher Affiliation | Academia | Lu Hou, James T. Kwok Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong {lhouab, jamesk}@cse.ust.hk |
| Pseudocode | Yes | Algorithm 3 Loss-Aware Ternarization (LAT) for training a feedforward neural network. Algorithm 4 Exact solver for ˆwt l with two scaling parameters. Algorithm 5 Approximate solver for ˆwt l with two scaling parameters |
| Open Source Code | No | No explicit statement about releasing their own source code was found. |
| Open Datasets | Yes | 1. MNIST: This contains 28 28 gray images from 10 digit classes. We use 50, 000 images for training, another 10, 000 for validation, and the remaining 10, 000 for testing. 2. CIFAR-10: This contains 32 32 color images from 10 object classes. We use 45, 000 images for training, another 5, 000 for validation, and the remaining 10, 000 for testing. 3. CIFAR-100: This contains 32 32 color images from 100 object classes. We use 45, 000 images for training, another 5, 000 for validation, and the remaining 10, 000 for testing. 4. SVHN: This contains 32 32 color images from 10 digit classes. We use 598, 388 images for training, another 6, 000 for validation, and the remaining 26, 032 for testing. The Penn Treebank data set (Taylor et al., 2003): ... with 5,017K characters for training, 393K for validation, and 442K characters for testing. |
| Dataset Splits | Yes | 1. MNIST: ... 10, 000 for validation... 2. CIFAR-10: ... 5, 000 for validation... 3. CIFAR-100: ... 5, 000 for validation... 4. SVHN: ... 6, 000 for validation... The Penn Treebank data set (Taylor et al., 2003): ... 393K for validation... |
| Hardware Specification | No | The paper mentions "NVIDIA for the gift of GPU card" but does not specify any particular GPU model, CPU, or other hardware components used for experiments. |
| Software Dependencies | Yes | We thank the developers of Theano (Theano Development Team, 2016), Pylearn2 (Goodfellow et al., 2013) and Lasagne. |
| Experiment Setup | Yes | For MNIST: 'Batch normalization with a minibatch size 100, is used to accelerate learning. The maximum number of epochs is 50. The learning rate starts at 0.01, and decays by a factor of 0.1 at epochs 15 and 25.' For LSTMs: 'We use a one-layer LSTM with 512 cells. The maximum number of epochs is 200, and the number of time steps is 100. The initial learning rate is 0.002. After 10 epochs, it is decayed by a factor of 0.98 after each epoch. The weights are initialized uniformly in [0.08, 0.08]. After each iteration, the gradients are clipped to the range [ 5, 5].' |