reproducibilityindex.ai

REx: Data-Free Residual Quantization Error Expansion

Authors: Edouard YVINEC, Arnaud Dapogny, Matthieu Cord, Kevin Bailly

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show experimentally that REx enables better trade-offs (in terms of accuracy given any target bit-width) on both convnets and transformers for computer vision, as well as NLP models. In particular, when applied to large language models, we show that REx elegantly solves the outlier problem that hinders state-of-the-art quantization methods. In addition, REx is backed off by strong theoretical guarantees on the preservation of the predictive function of the original model.Extensive empirical validation we show through a thorough empirical validation that, as a standalone method, REx signiﬁcantly outperforms every state-of-the-art data-free quantization technique, allowing to ﬁnd better trade-offs on a variety of benchmarks involving Conv Net for classiﬁcation, object detection or semantic segmentation as well as transformers on GLUE text classiﬁcation.
Researcher Affiliation	Collaboration	Edouard Yvinec1,2 , Arnaud Dapogny2 , Matthieu Cord1 , Kevin Bailly1,2 Sorbonne Université1, CNRS, ISIR, f-75005, 4 Place Jussieu 75005 Paris, France Datakalab2, 114 boulevard Malesherbes, 75017 Paris, France ey@datakalab.com
Pseudocode	Yes	Algorithm 1 Expansion Algorithm Require: trained DNN f with L layers, hyper-parameters : K and γ, operator Q initialize γl and initialize f (K) as a clone of f with K per-layer kernels for l 2 {1, . . . , L} do W base kernel of layer l in f Wacc 0 accumulated quantization error for k 2 {1, . . . , K} do γl Q(W Wacc) (k) γ I equation 7 set kth kernel of layer l of f (K) with R(k) γl Wacc Wacc + Q 1(R(k) γl ) end for end for
Open Source Code	No	The paper does not provide an explicit statement about releasing code for REx or a link to a code repository. It mentions adapting the method to existing engines like Open Vino [38] and Tensor RT [39].
Open Datasets	Yes	We used Image Net [33], Pascal VOC 2012 [34], City Scapes dataset [35] and GLUE [36] and common sense reasoning benchmarks (details in Appendix D).
Dataset Splits	No	The paper mentions using 'Image Net [33]', 'Pascal VOC 2012 [34]', 'City Scapes dataset [35]', and 'GLUE [36]', which are standard benchmarks, but does not explicitly state the specific train/test/validation splits used for their experiments with percentages or sample counts. It refers to a 'calibration/validation set' in a theoretical context, but not for its experimental setup.
Hardware Specification	No	The paper makes general statements about hardware, such as 'on a single middle range GPU', and discusses target devices like 'Turing [28]', 'Untether [29]', and 'Nvidia A100 [20]' in the context of quantization capabilities, but does not specify the exact GPU models, CPUs, or other detailed hardware specifications used to run their experiments.
Software Dependencies	No	The paper mentions 'CUDA' and 'Open Vino [38] and Tensor RT [39]' in the context of implementation and adaptation, but it does not specify any software dependencies with their required version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1') for reproducibility.
Experiment Setup	Yes	Unless stated otherwise, we apply symmetric, static, per-channel quantization as deﬁned in [30] and perform batch-normalization folding prior to any processing using the optimal method from [37].Algorithm 1 Expansion Algorithm Require: trained DNN f with L layers, hyper-parameters : K and γ, operator Q