SKDBERT: Compressing BERT via Stochastic Knowledge Distillation

Authors: Zixiang Ding, Guoqing Jiang, Shuai Zhang, Lin Guo, Wei Lin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on GLUE benchmark show that SKDBERT reduces the size of a BERT model by 40% while retaining 99.5% performances of language understanding and being 100% faster.
Researcher Affiliation Industry Zixiang Ding *1, Guoqing Jiang1, Shuai Zhang1, Lin Guo1, Wei Lin2 1Meituan 2Individual {dingzixiang, jiangguoqing03, zhangshuai51, guolin08}@meituan.com, lwsaviola@163.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper. While it references the supplementary materials PDF and third-party code, it does not provide a link to the authors' own implementation code.
Open Datasets Yes We evaluate SKDBERT on the GLUE benchmark including MRPC (Dolan and Brockett 2005), RTE (Bentivogli et al. 2009), STS-B (Cer et al. 2017), SST-2 (Socher et al. 2013), QQP (Chen et al. 2018), QNLI (Rajpurkar et al. 2016) and MNLI (Williams, Nangia, and Bowman 2017).
Dataset Splits Yes Table 1: Distillation performances of our student with single and multiple teachers on the development set of GLUE benchmark (Wang et al. 2019).
Hardware Specification Yes Moreover, all implementations are performed on NVIDIA A100 GPU.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup No The paper mentions that hyperparameters are shown in Section E of supplementary materials, but does not provide specific experimental setup details such as concrete hyperparameter values or training configurations in the main text.