Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Authors: Saurabh Goyal, Anamitra Roy Choudhury, Saurabh Raje, Venkatesan Chakaravarthy, Yogish Sabharwal, Ashish Verma
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present an experimental evaluation on a wide spectrum of classification/regression tasks from the popular GLUE benchmark. The results show that Po WER-BERT achieves up to 4.5x reduction in inference time over BERTBASE with < 1% loss in accuracy. |
| Researcher Affiliation | Industry | 1IBM Research, New Delhi, India 2IBM Research, Yorktown, New York, USA. |
| Pseudocode | No | The paper describes the Po WER-BERT scheme and its components textually and with figures, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for Po WER-BERT is publicly available at https: //github.com/IBM/Po WER-BERT. |
| Open Datasets | Yes | We evaluate our approach on a wide spectrum of classification/regression tasks pertaining to 9 datasets from the GLUE benchmark (Wang et al., 2019a), and the IMDB (Maas et al., 2011) and the RACE (Lai et al., 2017)) datasets. |
| Dataset Splits | Yes | The hyper-parameters for both Po WER-BERT and the baseline methods were tuned on the Dev dataset for GLUE and RACE tasks. For IMDB, we subdivided the training data into 80% for training and 20% for tuning. |
| Hardware Specification | Yes | The inference time experiments for Po WER-BERT and the baselines were conducted using Keras framework on a K80 GPU machine. |
| Software Dependencies | No | The paper mentions that the code was 'implemented in Keras' but does not specify version numbers for Keras or any other software dependencies. |
| Experiment Setup | Yes | Training Po WER-BERT primarily involves four hyper-parameters, which we select from the ranges listed below: a) learning rate for the newly introduced soft-extract layers [10 4, 10 2]; b) learning rate for the parameters from the original BERT model [2 10 5, 6 10 5]; c) regularization parameter λ that controls the trade-off between accuracy and inference time [10 4, 10 3]; d) batch size {4, 8, 16, 32, 64}. |