Distilling Knowledge from Well-Informed Soft Labels for Neural Relation Extraction

Authors: Zhenyu Zhang, Xiaobo Shu, Bowen Yu, Tingwen Liu, Jiapeng Zhao, Quangang Li, Li Guo9620-9627

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the TACRED and Sem Eval datasets, the experimental results justify the effectiveness of our approach.
Researcher Affiliation Academia Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China {zhangzhenyu1996, shuxiaobo, yubowen, liutingwen, zhaojiapeng, liquangang, guoli}@iie.ac.cn
Pseudocode No The paper includes architectural diagrams (Figure 2) and describes methods in text, but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code can be obtained from https://github.com/zzysay/KD4NRE.
Open Datasets Yes We conduct experiments on two widely used benchmark datasets: (1) TACRED (Zhang et al. 2017)... (2) Sem Eval (Hendrickx et al. 2010)...
Dataset Splits Yes Table 1: Statisticses of the TACRED and Sem Eval datasets. #Train #Dev #Test
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for the experiments. It only implies that models were trained and experiments performed.
Software Dependencies No The paper mentions using 'Glo Ve (Pennington, Socher, and Manning 2014) vectors' and the 'Stanford Core NLP toolkit' but does not specify the version numbers for these software components, which is necessary for reproducibility.
Experiment Setup Yes We set the NA probability C to 0.2, the temperature of knowledge distillation τ to 1, the weight factor of hint learning λht to 1.8, the weight factor of type constraints in Teacher-S γs to 0.8. The size of position embedding dp and NER tag embedding dn in MAA are both set to 30. Inspired by Clark et al. (2019), we adopt the teacher annealing strategy: Let λkd increases from 0 to 1 linearly throughout the training stage of student.