reproducibilityindex.ai

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

Authors: Xiuying Wei, Yunchen Zhang, Xiangguo Zhang, Ruihao Gong, Shanghang Zhang, Qi Zhang, Fengwei Yu, Xianglong Liu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments prove that our framework surpasses the existing works and, for the first time, pushes the 6-bit posttraining BERT quantization to the full-precision (FP) level.
Researcher Affiliation	Collaboration	Xiuying Wei1, 2 , Yunchen Zhang2, 4 , Xiangguo Zhang2 , Ruihao Gong1, 2, Shanghang Zhang3 , Qi Zhang2 , Fengwei Yu2 , Xianglong Liu1 1State Key Lab of Software Development Environment, Beihang University 2Sense Time Research, 3Peking University 4University of Electronic Science and Technology of China
Pseudocode	No	The paper contains a 'Flow diagram' in Figure 4 but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/wimh966/outlier_suppression.
Open Datasets	Yes	On the whole, we evaluate GLUE benchmark [33], SQu AD [34, 35], and XSum [36] and CNN/Daily Mail [37] across BERT, Ro BERTa, and BART models.
Dataset Splits	Yes	For PTQ, equipping our framework, we use 256 samples to calibrate the model. For training, hyper-parameters like learning rate are searched both for our methods and baseline techniques for fair comparisons. Details see Appendix F. (Table 4 and 5 also show MNLI acc m/mm, indicating matched/mismatched validation set accuracies common in GLUE).
Hardware Specification	No	The main paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments. While the author checklist indicates this information is provided, it is not present in the main body of the paper.
Software Dependencies	No	The paper mentions combining methods with 'LSQ+ [12]' and takes schemes from 'Faster Transformer [38]', but it does not specify software components with version numbers (e.g., PyTorch version, specific library versions).
Experiment Setup	Yes	For PTQ, equipping our framework, we use 256 samples to calibrate the model. For training, hyper-parameters like learning rate are searched both for our methods and baseline techniques for fair comparisons. Details see Appendix F. (Also, 'Here, 4-4-4 presents 4-bit weight, embedding, and activation.' and 'For the percentile, we search the hyper-parameter in [0.999, 0.9999, 0.99999] and report the best on dev set.')