HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
Authors: Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that HAWQ-V2 achieves new state-of-the-art results for a wide range of tasks. In particular, we present quantization results for Inception V3 (7.57MB with 75.98% accuracy), Res Net50 (7.99MB with 75.92% accuracy), and Squeeze Next (1MB with 68.68% accuracy), all without any manual bit selection. Furthermore, we present results for object detection on Microsoft COCO, where we achieve 2.6 higher m AP than direct uniform quantization and 1.6 higher m AP than the recently proposed method of FQN, with a smaller model size of 17.9MB. |
| Researcher Affiliation | Academia | Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer University of California, Berkeley, {zhendong, zheweiy, daiyaanarfeen, amirgh, mahoneymw, and keutzer}@berkeley.edu |
| Pseudocode | No | While the paper describes algorithms such as Hutchinson's method, it does not present them in a structured pseudocode block or a clearly labeled 'Algorithm' figure. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We present quantization results for Inception V3, Res Net50, and Squeeze Next (Table 1). Furthermore, we present results for object detection on the Microsoft COCO dataset, where HAWQ-V2 achieves 2.6 higher m AP than direct uniform quantization and 1.6 higher m AP than the recently proposed method of FQN, with a smaller model size of 17.9MB. [...] 3.2 Image Net [...] 3.3 Microsoft COCO |
| Dataset Splits | No | The paper mentions 'training the network' and general evaluation, but it does not specify the training, validation, or test dataset splits (e.g., percentages or exact sample counts) for any of the datasets used, nor does it refer to standard named splits for reproduction. |
| Hardware Specification | Yes | For example, we can compute Hessian trace for all 54 layers in Res Net50 in less than 30 minutes with 4 GPUs (only 33s per block on average). [...] We also gratefully acknowledge the support of NVIDIA Corporation for their donation of two Titan Xp GPU used for this research. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide a specific version number for PyTorch or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We can see that 50 Hutchinson iterations are sufficient to achieve an accurate approximation with low variance. Based on the convergence analysis, we are able to calculate all the average Hessian traces, shown in Figure 2, corresponding to 54 blocks in a Res Net50 model, within 30 minutes (33s per block on average) using 4 GPUs. [...] The main idea is to sort each candidate bit-precision setting in B based on the total second-order perturbation that they cause, according to the following metric: Tr(Hi) Q(Wi) Wi 2 2, (10) |