Reinforced Multi-Teacher Selection for Knowledge Distillation

Authors: Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang14284-14291

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experimental results on several NLP tasks clearly verify the feasibility and effectiveness of our approach.
Researcher Affiliation Collaboration 1 University of Electronic Science and Technology of China 2 Microsoft STCA NLP Group 3 School of Computing Science, Simon Fraser University
Pseudocode Yes Algorithm 1: Overall Training Procedure
Open Source Code No The paper does not explicitly state that the source code for their methodology is made available, nor does it provide a direct link to it.
Open Datasets Yes We evaluate our proposed approach on three different NLP tasks from the GLUE benchmark (Wang et al. 2019), namely Sentiment Classification (SC), Paraphrase Similarity Matching (PSM) and Natural Language Inference (NLI).
Dataset Splits Yes The statistics of the data sets are shown in Table 2. We use prediction accuracy as the metric in evaluation.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions software components like 'Patient KD' and 'BERT-Base' but does not specify their version numbers or other software dependencies with version details.
Experiment Setup Yes The learning rate is set to {1e-5, 2e-5, 5e-5}. The batch size is set to 32. The maximum sequence length is set to 128. The number of epochs is set to 4. The student models... we set the batch size to 32, the number of epochs to 4, the maximum length of sequence to 128, the learning rate to {1e-5, 2e-5, 5e-5}, the distillation temperature T to {5, 10, 20}, and the loss equilibrium coefficient α to {0.2, 0.5, 0.7}. We choose the best model based on the performance on the development set. The γ in the experiments ranges from {0.3, 0.5, 0.7, 0.9}, which is selected based on development set performance.