reproducibilityindex.ai

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

Authors: Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu13657-13665

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our combinatorial approach is able to outperform other existing techniques. Experimental Study A common practice in our ﬁeld to evaluate the quality of a KD technique is to feed T and S models with instances of standard datasets and measure how they perform.
Researcher Affiliation	Industry	Peyman Passban2,*, Yimeng Wu1, Mehdi Rezagholizadeh1, Qun Liu1 1Huawei Noah s Ark Lab, 2Amazon
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or a link to a code repository.
Open Datasets	Yes	We followed the same tradition in this paper and selected a set of eight GLUE tasks (Wang et al. 2018) including Co LA, MNLI, MRPC, QNLI, QQP, RTE, SST-2, and STS-B datasets to benchmark our models. Detailed information about datasets is available in the appendix section.
Dataset Splits	Yes	Similar to other papers, we evaluate our models on validation sets. Testset labels of GLUE datasets are not publicly available and researchers need to participate in leaderboard competitions to evaluate their models on testsets. Co LA: A corpus of English sentences drawn from books and journal articles with 8, 551 training and 1, 043 validation instances.
Hardware Specification	Yes	Hardware Each model is ﬁne-tuned on a single NVIDIA 32GB V100 GPU.
Software Dependencies	No	The paper mentions various models and frameworks (e.g., BERT, Transformer blocks) and implicitly uses common ML libraries, but it does not specify version numbers for any software dependencies.
Experiment Setup	Yes	In our setting, the batch size is set to 32 and the learning rate is selected from {1e 5, 2e 5, 5e 5}. η and λ take values from {0, 0.2, 0.5, 0.7} and β = 1 η λ.