reproducibilityindex.ai

SKDBERT: Compressing BERT via Stochastic Knowledge Distillation

Authors: Zixiang Ding, Guoqing Jiang, Shuai Zhang, Lin Guo, Wei Lin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on GLUE benchmark show that SKDBERT reduces the size of a BERT model by 40% while retaining 99.5% performances of language understanding and being 100% faster.
Researcher Affiliation	Industry	Zixiang Ding *1, Guoqing Jiang1, Shuai Zhang1, Lin Guo1, Wei Lin2 1Meituan 2Individual {dingzixiang, jiangguoqing03, zhangshuai51, guolin08}@meituan.com, lwsaviola@163.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper. While it references the supplementary materials PDF and third-party code, it does not provide a link to the authors' own implementation code.
Open Datasets	Yes	We evaluate SKDBERT on the GLUE benchmark including MRPC (Dolan and Brockett 2005), RTE (Bentivogli et al. 2009), STS-B (Cer et al. 2017), SST-2 (Socher et al. 2013), QQP (Chen et al. 2018), QNLI (Rajpurkar et al. 2016) and MNLI (Williams, Nangia, and Bowman 2017).
Dataset Splits	Yes	Table 1: Distillation performances of our student with single and multiple teachers on the development set of GLUE benchmark (Wang et al. 2019).
Hardware Specification	Yes	Moreover, all implementations are performed on NVIDIA A100 GPU.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup	No	The paper mentions that hyperparameters are shown in Section E of supplementary materials, but does not provide specific experimental setup details such as concrete hyperparameter values or training configurations in the main text.