reproducibilityindex.ai

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Authors: Hao Fu, Shaojun Zhou, Qihong Yang, Junjie Tang, Guiquan Liu, Kaikui Liu, Xiaolong Li12830-12838

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, by verifying 9 datasets on the General Language Understanding Evaluation (GLUE) benchmark, the performance of the proposed LRC-BERT exceeds the existing state-of-the-art methods, which proves the effectiveness of our method.
Researcher Affiliation	Collaboration	1School of Computer Science and Technology, University of Science and Technology of China 2Alibaba Group
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link to its open-source code for the described methodology.
Open Datasets	Yes	We evaluate LRC-BERT on GLUE benchmark. The datasets provided on GLUE were all from NLP datasets with high recognition. We evaluate LRC-BERT in tasks such as natural language reasoning, emotion analysis, reading comprehension and semantic similarity.
Dataset Splits	No	The paper refers to using 'dev' sets for evaluation (e.g., 'The evaluation results of these four tasks on dev are shown in Table 3.'), and provides training sample counts for datasets in Table 1, but it does not specify explicit percentages or absolute counts for training, validation, and test splits or the methodology for these splits (e.g., '80/10/10 split').
Hardware Specification	Yes	We distill our student model with 6 V100 in the pretraining stage, and 4 V100 for distillation training on speciﬁc task dataset and extended dataset. In the inference experiments, we report the results of the student on a single V100.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	For the distillation of each task on GLUE, we ﬁne-tune a BERT-base teacher, choosing learning rates of 5e-5, 1e4, and 3e-4 with batchsize of 16 to distill LRC-BERT and LRC-BERT1. For each sample, we choose the remaining 15 samples in batchsize as negative samples, i.e. K = 15. Among them, 90 epochs of distillation are performed on the MRPC, RTE, and Co LA with the training dataset less than 10K, and 18 epochs of distillation on other tasks. For the proposed two-stage training method, the ﬁrst 80% of the steps are chosen as the ﬁrst stage of training, the rest 20% of the steps are the second stage. Then, we set the parameters of the second stage to α : β : γ = 1 : 1 : 3, and the search range of each parameter is {1,2,3,4}. For the hyperparametric temperature τ, we set it to 1.1.