HAWQ-V3: Dyadic Neural Network Quantization

Authors: Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, Kurt Keutzer

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental extensive evaluation of the proposed methods on Res Net18/50 and Inception V3, for various model compression levels with/without mixed precision. For Res Net50, our INT8 quantization achieves an accuracy of 77.58%, which is 2.68% higher than prior integer-only work, and our mixed-precision INT4/8 quantization can reduce INT8 latency by 23% and still achieve 76.73% accuracy.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Amazon 3Shanghai Jiao Tong University. Correspondence to: Zhewei Yao <zheweiy@berkeley.edu>, Amir Gholami <amirgh@berkeley.edu>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Our framework and the TVM implementation have been open sourced (HAWQ, 2020). The citation HAWQ, 2020 refers to 'HAWQ. https://github.com/zhen-dong/hawq.git, October 2020.'
Open Datasets Yes We first start with Res Net18/50 and Inception V3 quantization on Image Net. ImageNet is cited as: 'Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Image Net: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248 255. Ieee, 2009.'
Dataset Splits No The paper uses standard datasets like ImageNet and models like ResNet18/50 and Inception V3, which typically have predefined splits. However, the paper does not explicitly state the specific training/test/validation dataset splits (e.g., percentages or sample counts) used for its experiments within the provided text.
Hardware Specification Yes We target Nvidia Turing Tensor Cores of T4 GPU for deployment, as it supports both INT8 and INT4 precision and has been enhanced for deep learning network inference. By profiling the latency of different layers, we show that we can achieve an average of 1.47 speed up with INT4, as compared to INT8 on a T4 GPU for Res Net50.
Software Dependencies No The paper mentions software like 'open source PULP library (Roy & Mitchell, 2020) in Python', 'Apache TVM (Chen et al., 2018)', and 'Py Torch' but does not specify their version numbers, which is necessary for reproducibility.
Experiment Setup Yes Detailed discussion on the implementation and set up is provided in Appendix H. In Appendix H, the paper states: 'For ImageNet, we use SGD optimizer with Nesterov momentum of 0.9. We train our models with a batch size of 256 for 300 epochs. We set an initial learning rate of 0.1 for ResNet18/50 and 0.045 for Inception V3, and decay it by a factor of 10 at 100, 200, 250 epochs respectively. We use a weight decay of 1e-4. For distillation, we use the temperature of 2.0, and the distillation loss has a weight of 0.9.'