reproducibilityindex.ai

Post-Training Sparsity-Aware Quantization

Authors: Gil Shomron, Freddy Gabbay, Samer Kurzum, Uri Weiser

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the impact on model accuracy using Py Torch [26], the ILSVRC-2012 dataset [28], and various CNN models [8, 9, 11, 37, 37, 38] (see Table 1). All models are quantized using a simple uniform min-max quantization, employing symmetric unsigned per-layer quantization for activations and symmetric signed per-kernel quantization for weights. The min-max statistics are gathered during a quick preprocessing stage on 2K randomly picked images from the training set. In addition, during preprocessing, we recalibrate the Batch Norm layers running mean and running variance statistics [29, 33, 35, 36]. In all models, the ﬁrst convolution layer is left intact, since its input activations, which correspond to the image pixels, do not include many zero values, if any. Quantization is, therefore, performed on all convolution layers, with the exception of the ﬁrst layer. We present the quantization results in Table 1 . Throughout this section, we use SPARQ on top of the 8-bit models (A8W8) and report the accuracy degradation relative to the corresponding FP32 model. A4W8 and A8W4 are presented in Table 1 as references to the worse-case accuracy.
Researcher Affiliation	Academia	Gil Shomron Freddy Gabbay Samer Kurzum Uri Weiser Technion Israel Institute of Technology, Haifa, Israel Ruppin Academic Center, Emek Hefer, Israel {gilsho@campus, ssamer15@campus, uri.weiser@ee}.technion.ac.il freddyg@ruppin.ac.il
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/gilshm/sparq.
Open Datasets	Yes	We evaluate the impact on model accuracy using Py Torch [26], the ILSVRC-2012 dataset [28], and various CNN models [8, 9, 11, 37, 37, 38] (see Table 1).
Dataset Splits	Yes	The min-max statistics are gathered during a quick preprocessing stage on 2K randomly picked images from the training set. In addition, during preprocessing, we recalibrate the Batch Norm layers running mean and running variance statistics [29, 33, 35, 36].
Hardware Specification	No	The paper discusses hardware implementations (systolic arrays, Tensor Cores) and their design/synthesis with specific technologies (65nm standard cell library) but does not specify the hardware (e.g., CPU, GPU models) used to run the deep learning model experiments in PyTorch.
Software Dependencies	No	The paper mentions 'Py Torch [26]' but does not provide a specific version number for it or any other software dependency.
Experiment Setup	Yes	All models are quantized using a simple uniform min-max quantization, employing symmetric unsigned per-layer quantization for activations and symmetric signed per-kernel quantization for weights. The min-max statistics are gathered during a quick preprocessing stage on 2K randomly picked images from the training set. In addition, during preprocessing, we recalibrate the Batch Norm layers running mean and running variance statistics [29, 33, 35, 36]. In all models, the ﬁrst convolution layer is left intact... For the 2:4 structured pruning... retrain the model from scratch for 90 epochs with a learning rate starting from 0.1 and divided by 10 at epochs 30 and 60. Weight decay and momentum are set to 0.0001 and 0.9, respectively.