Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Automatic Mixed-Precision Quantization Search of BERT
Authors: Changsheng Zhao, Ting Hua, Yilin Shen, Qian Lou, Hongxia Jin
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on BERT downstream tasks reveal that our proposed method outperforms baselines by providing the same performance with much smaller model size.Extensive experimental validation on various NLP tasks. We evaluate the proposed AQ-BERT on four NLP tasks, including Sentiment Classification, Question answer, Natural Language Inference, and Named Entity Recognition. |
| Researcher Affiliation | Industry | Changsheng Zhao , Ting Hua , Yilin Shen , Qian Lou , Hongxia Jin Samsung Research America EMAIL |
| Pseudocode | Yes | Algorithm 1: The Procedure of AQ-BERT |
| Open Source Code | No | Our implementation is based on transformers by huggingface1. The Adam W optimizer is set with learning rate 2e 5, and SGD is set with learning rate 0.1 for architecture optimization. (footnote 1 points to https://github.com/huggingface/transformers) |
| Open Datasets | Yes | We evaluate our proposed AQ-BERT and other baselines (bert-base, Q-BERT, and Distilbert-base) on four NLP tasks: SST-2, MNLI, Co NLL-2003, and SQu AD. |
| Dataset Splits | Yes | Input: training set Dtrain and validation set Dval (in Algorithm 1) and Calculate Lval on Dval via Equation 14 to update bit assignments O (in Algorithm 1). |
| Hardware Specification | No | The paper does not specify any hardware used for experiments. |
| Software Dependencies | No | Our implementation is based on transformers by huggingface1. The Adam W optimizer is set with learning rate 2e 5, and SGD is set with learning rate 0.1 for architecture optimization. No version numbers for "transformers", "Adam W" or "SGD" are given. |
| Experiment Setup | Yes | The Adam W optimizer is set with learning rate 2e 5, and SGD is set with learning rate 0.1 for architecture optimization. Both Q-BERT and our method are using 8-bits activation. All model sizes reported here exclude the embedding layer, as we uniformly quantized embedding by 8-bit. |