BiT: Robustly Binarized Multi-distilled Transformer
Authors: Zechun Liu, Barlas Oguz, Aasish Pappu, Lin Xiao, Scott Yih, Meng Li, Raghuraman Krishnamoorthi, Yashar Mehdad
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We follow recent work (Bai et al., 2021; Qin et al., 2021) in adopting the experimental setting of Devlin et al. (2019), and use the pre-trained BERT-base as our full-precision baseline. We evaluate on GLUE (Wang et al., 2019), a varied set of language understanding tasks (see Section A.5 for a full list), as well as SQu AD (v1.1) (Rajpurkar et al., 2016), a popular machine reading comprehension dataset. |
| Researcher Affiliation | Collaboration | Zechun Liu Reality Labs, Meta Inc. zechunliu@fb.com Barlas O guz Meta AI barlaso@fb.com Aasish Pappu Meta AI aasish@fb.com Lin Xiao Meta AI linx@fb.com Scott Yih Meta AI scottyih@fb.com Meng Li Peking University meng.li@pku.edu.cn Raghuraman Krishnamoorthi Reality Labs, Meta Inc. raghuraman@fb.com Yashar Mehdad Meta AI mehdad@fb.com |
| Pseudocode | Yes | Algorithm 1 Bi T: Multi-distillation algorithm |
| Open Source Code | Yes | Code and models are available at: https://github.com/facebookresearch/bit. |
| Open Datasets | Yes | We evaluate on GLUE (Wang et al., 2019), a varied set of language understanding tasks (see Section A.5 for a full list), as well as SQu AD (v1.1) (Rajpurkar et al., 2016), a popular machine reading comprehension dataset. |
| Dataset Splits | Yes | Table 1: Comparison of BERT quantization methods on the GLUE dev set. [...] Table 3: Comparison of BERT quantization methods on SQu ADv1.1 dev set. |
| Hardware Specification | No | The paper claims to include hardware specifications but does not provide any specific GPU or CPU models, memory details, or cloud provider instances with specifications within the text. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | No | The paper references prior work for experimental settings (e.g., "following the exact setup in Zhang et al. (2020)") but does not explicitly state hyperparameters (like learning rate, batch size, number of epochs) or other system-level training configurations within the main text. |