HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Authors: Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that HAWQ-V2 achieves new state-of-the-art results for a wide range of tasks. In particular, we present quantization results for Inception V3 (7.57MB with 75.98% accuracy), Res Net50 (7.99MB with 75.92% accuracy), and Squeeze Next (1MB with 68.68% accuracy), all without any manual bit selection. Furthermore, we present results for object detection on Microsoft COCO, where we achieve 2.6 higher m AP than direct uniform quantization and 1.6 higher m AP than the recently proposed method of FQN, with a smaller model size of 17.9MB.
Researcher Affiliation Academia Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer University of California, Berkeley, {zhendong, zheweiy, daiyaanarfeen, amirgh, mahoneymw, and keutzer}@berkeley.edu
Pseudocode No While the paper describes algorithms such as Hutchinson's method, it does not present them in a structured pseudocode block or a clearly labeled 'Algorithm' figure.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We present quantization results for Inception V3, Res Net50, and Squeeze Next (Table 1). Furthermore, we present results for object detection on the Microsoft COCO dataset, where HAWQ-V2 achieves 2.6 higher m AP than direct uniform quantization and 1.6 higher m AP than the recently proposed method of FQN, with a smaller model size of 17.9MB. [...] 3.2 Image Net [...] 3.3 Microsoft COCO
Dataset Splits No The paper mentions 'training the network' and general evaluation, but it does not specify the training, validation, or test dataset splits (e.g., percentages or exact sample counts) for any of the datasets used, nor does it refer to standard named splits for reproduction.
Hardware Specification Yes For example, we can compute Hessian trace for all 54 layers in Res Net50 in less than 30 minutes with 4 GPUs (only 33s per block on average). [...] We also gratefully acknowledge the support of NVIDIA Corporation for their donation of two Titan Xp GPU used for this research.
Software Dependencies No The paper mentions 'Py Torch' but does not provide a specific version number for PyTorch or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes We can see that 50 Hutchinson iterations are sufficient to achieve an accurate approximation with low variance. Based on the convergence analysis, we are able to calculate all the average Hessian traces, shown in Figure 2, corresponding to 54 blocks in a Res Net50 model, within 30 minutes (33s per block on average) using 4 GPUs. [...] The main idea is to sort each candidate bit-precision setting in B based on the total second-order perturbation that they cause, according to the following metric: Tr(Hi) Q(Wi) Wi 2 2, (10)