Trained Ternary Quantization

Authors: Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on CIFAR-10 show that the ternary models obtained by trained quantization method outperform full-precision models of Res Net-32,44,56 by 0.04%, 0.16%, 0.36%, respectively. On Image Net, our model outperforms full-precision Alex Net model by 0.3% of Top-1 accuracy and outperforms previous ternary models by 3%.
Researcher Affiliation Collaboration Chenzhuo Zhu Tsinghua University zhucz13@mails.tsinghua.edu.cn Song Han Stanford University songhan@stanford.edu Huizi Mao Stanford University huizi@stanford.edu William J. Dally Stanford University NVIDIA dally@stanford.edu
Pseudocode No The paper describes procedures and includes a diagram (Figure 1), but it does not provide formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We perform our experiments on CIFAR-10 (Krizhevsky & Hinton, 2009) and Image Net (Russakovsky et al., 2015).
Dataset Splits Yes CIFAR-10 is an image classification benchmark containing images of size 32 32RGB pixels in a training set of 50000 and a test set of 10000. [...] ILSVRC12 is a 1000-category dataset with over 1.2 million images in training set and 50 thousand images in validation set. Images from ILSVRC12 also have various resolutions. We used a variant of Alex Net(Krizhevsky et al. (2012)) structure by removing dropout layers and add batch normalization(Ioffe & Szegedy, 2015) for all models in our experiments.
Hardware Specification No The paper mentions that "On custom hardware, multiplications can be pre-computed..." but does not specify the hardware used for running the experiments described in the paper (e.g., specific GPU or CPU models).
Software Dependencies No Our network is implemented on both Tensor Flow (Abadi & et. al o, 2015) and Caffe (Jia et al., 2014) frameworks. The paper mentions software names but does not provide specific version numbers for TensorFlow or Caffe.
Experiment Setup Yes Learning rate is set to 0.1 at beginning and scaled by 0.1 at epoch 80, 120 and 300. A L2-normalized weight decay of 0.0002 is used as regularizer. Most of our models converge after 160 epochs. We take a moving average on errors of all epochs to filter off fluctuations when reporting error rate. (...) Minibatch size is set to 128. Learning rate starts at 10 4 and is scaled by 0.2 at epoch 56 and 64. A L2-normalized weight decay of 5 10 6 is used as a regularizer. Images are first resized to 256 256 then randomly cropped to 224 224 before input.