Trained Ternary Quantization
Authors: Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on CIFAR-10 show that the ternary models obtained by trained quantization method outperform full-precision models of Res Net-32,44,56 by 0.04%, 0.16%, 0.36%, respectively. On Image Net, our model outperforms full-precision Alex Net model by 0.3% of Top-1 accuracy and outperforms previous ternary models by 3%. |
| Researcher Affiliation | Collaboration | Chenzhuo Zhu Tsinghua University zhucz13@mails.tsinghua.edu.cn Song Han Stanford University songhan@stanford.edu Huizi Mao Stanford University huizi@stanford.edu William J. Dally Stanford University NVIDIA dally@stanford.edu |
| Pseudocode | No | The paper describes procedures and includes a diagram (Figure 1), but it does not provide formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We perform our experiments on CIFAR-10 (Krizhevsky & Hinton, 2009) and Image Net (Russakovsky et al., 2015). |
| Dataset Splits | Yes | CIFAR-10 is an image classification benchmark containing images of size 32 32RGB pixels in a training set of 50000 and a test set of 10000. [...] ILSVRC12 is a 1000-category dataset with over 1.2 million images in training set and 50 thousand images in validation set. Images from ILSVRC12 also have various resolutions. We used a variant of Alex Net(Krizhevsky et al. (2012)) structure by removing dropout layers and add batch normalization(Ioffe & Szegedy, 2015) for all models in our experiments. |
| Hardware Specification | No | The paper mentions that "On custom hardware, multiplications can be pre-computed..." but does not specify the hardware used for running the experiments described in the paper (e.g., specific GPU or CPU models). |
| Software Dependencies | No | Our network is implemented on both Tensor Flow (Abadi & et. al o, 2015) and Caffe (Jia et al., 2014) frameworks. The paper mentions software names but does not provide specific version numbers for TensorFlow or Caffe. |
| Experiment Setup | Yes | Learning rate is set to 0.1 at beginning and scaled by 0.1 at epoch 80, 120 and 300. A L2-normalized weight decay of 0.0002 is used as regularizer. Most of our models converge after 160 epochs. We take a moving average on errors of all epochs to filter off fluctuations when reporting error rate. (...) Minibatch size is set to 128. Learning rate starts at 10 4 and is scaled by 0.2 at epoch 56 and 64. A L2-normalized weight decay of 5 10 6 is used as a regularizer. Images are first resized to 256 256 then randomly cropped to 224 224 before input. |