Post-Training Sparsity-Aware Quantization
Authors: Gil Shomron, Freddy Gabbay, Samer Kurzum, Uri Weiser
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the impact on model accuracy using Py Torch [26], the ILSVRC-2012 dataset [28], and various CNN models [8, 9, 11, 37, 37, 38] (see Table 1). All models are quantized using a simple uniform min-max quantization, employing symmetric unsigned per-layer quantization for activations and symmetric signed per-kernel quantization for weights. The min-max statistics are gathered during a quick preprocessing stage on 2K randomly picked images from the training set. In addition, during preprocessing, we recalibrate the Batch Norm layers running mean and running variance statistics [29, 33, 35, 36]. In all models, the first convolution layer is left intact, since its input activations, which correspond to the image pixels, do not include many zero values, if any. Quantization is, therefore, performed on all convolution layers, with the exception of the first layer. We present the quantization results in Table 1 . Throughout this section, we use SPARQ on top of the 8-bit models (A8W8) and report the accuracy degradation relative to the corresponding FP32 model. A4W8 and A8W4 are presented in Table 1 as references to the worse-case accuracy. |
| Researcher Affiliation | Academia | Gil Shomron Freddy Gabbay Samer Kurzum Uri Weiser Technion Israel Institute of Technology, Haifa, Israel Ruppin Academic Center, Emek Hefer, Israel {gilsho@campus, ssamer15@campus, uri.weiser@ee}.technion.ac.il freddyg@ruppin.ac.il |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/gilshm/sparq. |
| Open Datasets | Yes | We evaluate the impact on model accuracy using Py Torch [26], the ILSVRC-2012 dataset [28], and various CNN models [8, 9, 11, 37, 37, 38] (see Table 1). |
| Dataset Splits | Yes | The min-max statistics are gathered during a quick preprocessing stage on 2K randomly picked images from the training set. In addition, during preprocessing, we recalibrate the Batch Norm layers running mean and running variance statistics [29, 33, 35, 36]. |
| Hardware Specification | No | The paper discusses hardware implementations (systolic arrays, Tensor Cores) and their design/synthesis with specific technologies (65nm standard cell library) but does not specify the hardware (e.g., CPU, GPU models) used to run the deep learning model experiments in PyTorch. |
| Software Dependencies | No | The paper mentions 'Py Torch [26]' but does not provide a specific version number for it or any other software dependency. |
| Experiment Setup | Yes | All models are quantized using a simple uniform min-max quantization, employing symmetric unsigned per-layer quantization for activations and symmetric signed per-kernel quantization for weights. The min-max statistics are gathered during a quick preprocessing stage on 2K randomly picked images from the training set. In addition, during preprocessing, we recalibrate the Batch Norm layers running mean and running variance statistics [29, 33, 35, 36]. In all models, the first convolution layer is left intact... For the 2:4 structured pruning... retrain the model from scratch for 90 epochs with a learning rate starting from 0.1 and divided by 10 at epochs 30 and 60. Weight decay and momentum are set to 0.0001 and 0.9, respectively. |