Towards Accurate Post-training Network Quantization via Bit-Split and Stitching

Authors: Peisong Wang, Qiang Chen, Xiangyu He, Jian Cheng

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the efficiency of our proposed method. We first evaluate the Bit-Split and Stitching method for weight quantization. Then the performance of Error Compensated Activation Quantization is evaluated. We also compare our method with current post-training methods. All bit-width representations throughout this paper take the sign bit into consideration. Codes are available on Git Hub at https://github.com/wps712/Bit Split.
Researcher Affiliation Academia NLPR & AIRIA, Institute of Automation, Chinese Academy of Sciences.
Pseudocode Yes Algorithm 1 Post-training quantization using Error Compensated Activation Quantization and Bit-Split and Stitching weight quantization.
Open Source Code Yes Codes are available on Git Hub at https://github.com/wps712/Bit Split.
Open Datasets Yes The top-1 and top-5 accuracy results of post-training quantization are reported using four popular convolutional models pre-trained on the Image Net dataset. We use the Py Torch pretrained models for all experiments. [...] MS COCO dataset is used for evaluation.
Dataset Splits Yes The pre-trained models are trained on 80k training images and 35k of validation images (trainval35k), and is evaluated on the remaining 5k validation images (minival).
Hardware Specification No The paper mentions running experiments on "GPU" and "TPU" but does not provide specific details such as model numbers, memory, or processor types for the hardware used.
Software Dependencies No The paper mentions using "Py Torch pretrained models" and "mmdetection1 toolbox" but does not specify version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes We quantize all layers into 4bit except the first layer and the final output layers which are quantized to 8bit. Activations are quantized into 8bit. The experiments are conducted using mmdetection1 toolbox. [...] Input images are resized to 800 pixels in the shorter edge.