reproducibilityindex.ai

Distilling Knowledge from Well-Informed Soft Labels for Neural Relation Extraction

Authors: Zhenyu Zhang, Xiaobo Shu, Bowen Yu, Tingwen Liu, Jiapeng Zhao, Quangang Li, Li Guo9620-9627

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on the TACRED and Sem Eval datasets, the experimental results justify the effectiveness of our approach.
Researcher Affiliation	Academia	Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China {zhangzhenyu1996, shuxiaobo, yubowen, liutingwen, zhaojiapeng, liquangang, guoli}@iie.ac.cn
Pseudocode	No	The paper includes architectural diagrams (Figure 2) and describes methods in text, but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code can be obtained from https://github.com/zzysay/KD4NRE.
Open Datasets	Yes	We conduct experiments on two widely used benchmark datasets: (1) TACRED (Zhang et al. 2017)... (2) Sem Eval (Hendrickx et al. 2010)...
Dataset Splits	Yes	Table 1: Statisticses of the TACRED and Sem Eval datasets. #Train #Dev #Test
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for the experiments. It only implies that models were trained and experiments performed.
Software Dependencies	No	The paper mentions using 'Glo Ve (Pennington, Socher, and Manning 2014) vectors' and the 'Stanford Core NLP toolkit' but does not specify the version numbers for these software components, which is necessary for reproducibility.
Experiment Setup	Yes	We set the NA probability C to 0.2, the temperature of knowledge distillation τ to 1, the weight factor of hint learning λht to 1.8, the weight factor of type constraints in Teacher-S γs to 0.8. The size of position embedding dp and NER tag embedding dn in MAA are both set to 30. Inspired by Clark et al. (2019), we adopt the teacher annealing strategy: Let λkd increases from 0 to 1 linearly throughout the training stage of student.