reproducibilityindex.ai

Towards Efficient Post-training Quantization of Pre-trained Language Models

Authors: Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R Lyu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on GLUE and SQu AD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys signiﬁcant reductions in training time, memory overhead, and data consumption.
Researcher Affiliation	Collaboration	Haoli Bai1,2 , Lu Hou1, Lifeng Shang1, Xin Jiang1, Irwin King2, Michael R. Lyu2 1Huawei Noah s Ark Lab, 2The Chinese University of Hong Kong {baihaoli,houlu3,Shang.Lifeng,Jiang.Xin}@huawei.com, {king,lyu}@cse.cuhk.edu.hk
Pseudocode	Yes	Algorithm 1 Efﬁcient PTQ for PLMs. Algorithm 2 MREM algorithm.
Open Source Code	No	The paper does not provide a direct link to a source code repository or an explicit statement about the code being made publicly available in the main body. The checklist mentions code availability but does not provide a URL or reference in the main text.
Open Datasets	Yes	We evaluate post-training quantization on both the GLUE [45], and SQu AD benchmarks [39].
Dataset Splits	Yes	We use the same evaluation metrics in [12, 56] for the development set of GLUE and SQu AD benchmarks. For results in Section 4.2, we report accuracies on both the matched section and mis-matched sections of MNLI, and EM (exact match) and F1 score for SQu AD.
Hardware Specification	Yes	The training time and memory in (a) and (b) are measured by 4-bit weights and 8-bit activations (i.e., W4A8) on an NVIDIA V100 GPU. ...By default, we partition the model into 4 modules on 4 NVIDIA-V100 GPUs.
Software Dependencies	No	Our implementation is based on Mind Spore [1]. The version number for Mind Spore or any other software dependencies is not specified.
Experiment Setup	Yes	For each module, we train for 2, 000 steps with an initial learning rate of 1e-4 on GLUE tasks, and 4, 000 steps with an initial learning rate of 5e-5 on SQu AD datasets. The learning rate decays linearly as done in [24, 56]. By default, we partition the model into 4 modules on 4 NVIDIA-V100 GPUs.