REx: Data-Free Residual Quantization Error Expansion
Authors: Edouard YVINEC, Arnaud Dapogny, Matthieu Cord, Kevin Bailly
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that REx enables better trade-offs (in terms of accuracy given any target bit-width) on both convnets and transformers for computer vision, as well as NLP models. In particular, when applied to large language models, we show that REx elegantly solves the outlier problem that hinders state-of-the-art quantization methods. In addition, REx is backed off by strong theoretical guarantees on the preservation of the predictive function of the original model.Extensive empirical validation we show through a thorough empirical validation that, as a standalone method, REx significantly outperforms every state-of-the-art data-free quantization technique, allowing to find better trade-offs on a variety of benchmarks involving Conv Net for classification, object detection or semantic segmentation as well as transformers on GLUE text classification. |
| Researcher Affiliation | Collaboration | Edouard Yvinec1,2 , Arnaud Dapogny2 , Matthieu Cord1 , Kevin Bailly1,2 Sorbonne Université1, CNRS, ISIR, f-75005, 4 Place Jussieu 75005 Paris, France Datakalab2, 114 boulevard Malesherbes, 75017 Paris, France ey@datakalab.com |
| Pseudocode | Yes | Algorithm 1 Expansion Algorithm Require: trained DNN f with L layers, hyper-parameters : K and γ, operator Q initialize γl and initialize f (K) as a clone of f with K per-layer kernels for l 2 {1, . . . , L} do W base kernel of layer l in f Wacc 0 accumulated quantization error for k 2 {1, . . . , K} do γl Q(W Wacc) (k) γ I equation 7 set kth kernel of layer l of f (K) with R(k) γl Wacc Wacc + Q 1(R(k) γl ) end for end for |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code for REx or a link to a code repository. It mentions adapting the method to existing engines like Open Vino [38] and Tensor RT [39]. |
| Open Datasets | Yes | We used Image Net [33], Pascal VOC 2012 [34], City Scapes dataset [35] and GLUE [36] and common sense reasoning benchmarks (details in Appendix D). |
| Dataset Splits | No | The paper mentions using 'Image Net [33]', 'Pascal VOC 2012 [34]', 'City Scapes dataset [35]', and 'GLUE [36]', which are standard benchmarks, but does not explicitly state the specific train/test/validation splits used for their experiments with percentages or sample counts. It refers to a 'calibration/validation set' in a theoretical context, but not for its experimental setup. |
| Hardware Specification | No | The paper makes general statements about hardware, such as 'on a single middle range GPU', and discusses target devices like 'Turing [28]', 'Untether [29]', and 'Nvidia A100 [20]' in the context of quantization capabilities, but does not specify the exact GPU models, CPUs, or other detailed hardware specifications used to run their experiments. |
| Software Dependencies | No | The paper mentions 'CUDA' and 'Open Vino [38] and Tensor RT [39]' in the context of implementation and adaptation, but it does not specify any software dependencies with their required version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1') for reproducibility. |
| Experiment Setup | Yes | Unless stated otherwise, we apply symmetric, static, per-channel quantization as defined in [30] and perform batch-normalization folding prior to any processing using the optimal method from [37].Algorithm 1 Expansion Algorithm Require: trained DNN f with L layers, hyper-parameters : K and γ, operator Q |