BERT Loses Patience: Fast and Robust Inference with Early Exit
Authors: Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the GLUE benchmark and show that PABEE outperforms existing prediction probability distribution-based exit criteria by a large margin. |
| Researcher Affiliation | Collaboration | Wangchunshu Zhou1 , Canwen Xu2 , Tao Ge3, Julian Mc Auley2, Ke Xu1, Furu Wei3 1Beihang University 2University of California, San Diego 3Microsoft Research Asia |
| Pseudocode | No | The paper describes the inference and training processes using mathematical equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | 2Code available at https://github.com/Jet Runner/PABEE. |
| Open Datasets | Yes | We evaluate our proposed approach on the GLUE benchmark [35]. |
| Dataset Splits | Yes | We apply an early stopping mechanism and select the model with the best performance on the development set. |
| Hardware Specification | Yes | We conduct our experiments on a single Nvidia V100 16GB GPU. |
| Software Dependencies | No | The paper mentions implementing PABEE on 'Hugging Face s Transformers [43]' but does not provide a specific version number for this or any other software dependency. |
| Experiment Setup | Yes | We perform grid search over batch sizes of {16, 32, 128}, and learning rates of {1e-5, 2e-5, 3e-5, 5e-5} with an Adam optimizer. We apply an early stopping mechanism and select the model with the best performance on the development set. |