HAWQ-V3: Dyadic Neural Network Quantization
Authors: Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, Kurt Keutzer
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive evaluation of the proposed methods on Res Net18/50 and Inception V3, for various model compression levels with/without mixed precision. For Res Net50, our INT8 quantization achieves an accuracy of 77.58%, which is 2.68% higher than prior integer-only work, and our mixed-precision INT4/8 quantization can reduce INT8 latency by 23% and still achieve 76.73% accuracy. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Amazon 3Shanghai Jiao Tong University. Correspondence to: Zhewei Yao <zheweiy@berkeley.edu>, Amir Gholami <amirgh@berkeley.edu>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Our framework and the TVM implementation have been open sourced (HAWQ, 2020). The citation HAWQ, 2020 refers to 'HAWQ. https://github.com/zhen-dong/hawq.git, October 2020.' |
| Open Datasets | Yes | We first start with Res Net18/50 and Inception V3 quantization on Image Net. ImageNet is cited as: 'Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Image Net: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248 255. Ieee, 2009.' |
| Dataset Splits | No | The paper uses standard datasets like ImageNet and models like ResNet18/50 and Inception V3, which typically have predefined splits. However, the paper does not explicitly state the specific training/test/validation dataset splits (e.g., percentages or sample counts) used for its experiments within the provided text. |
| Hardware Specification | Yes | We target Nvidia Turing Tensor Cores of T4 GPU for deployment, as it supports both INT8 and INT4 precision and has been enhanced for deep learning network inference. By proļ¬ling the latency of different layers, we show that we can achieve an average of 1.47 speed up with INT4, as compared to INT8 on a T4 GPU for Res Net50. |
| Software Dependencies | No | The paper mentions software like 'open source PULP library (Roy & Mitchell, 2020) in Python', 'Apache TVM (Chen et al., 2018)', and 'Py Torch' but does not specify their version numbers, which is necessary for reproducibility. |
| Experiment Setup | Yes | Detailed discussion on the implementation and set up is provided in Appendix H. In Appendix H, the paper states: 'For ImageNet, we use SGD optimizer with Nesterov momentum of 0.9. We train our models with a batch size of 256 for 300 epochs. We set an initial learning rate of 0.1 for ResNet18/50 and 0.045 for Inception V3, and decay it by a factor of 10 at 100, 200, 250 epochs respectively. We use a weight decay of 1e-4. For distillation, we use the temperature of 2.0, and the distillation loss has a weight of 0.9.' |