Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction

Authors: Liang Zhang, Jinsong Su, Zijun Min, Zhongjian Miao, Qingguo Hu, Biao Fu, Xiaodong Shi, Yidong Chen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct comprehensive experiments on three benchmark datasets, of which experimental results demonstrate that our model consistently outperforms all competitive baselines.
Researcher Affiliation Academia 1School of Informatics, Xiamen University, China 2Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China lzhang@stu.xmu.edu.cn, {jssu,ydchen}@xmu.edu.cn
Pseudocode No The paper describes the model architecture and training process in narrative text and mathematical formulas, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our source code is available at https://github.com/Deep Learn XMU/Doc RE-SD.
Open Datasets Yes We evaluate our model on three commonly-used datasets: Doc RED (Yao et al. 2019). It is a large-scale humanannotated dataset for document-level RE, which is constructed from Wikipedia and Wikidata. ... CDR (Li et al. 2016). It is a biomedical dataset and consists of 1,500 Pub Med abstracts, which are equally divided into three sets for training, development, and testing. ... GDA (Wu et al. 2019). This dataset is a large-scale biomedical one, which is constructed from MEDLINE abstracts by method of distant supervision.
Dataset Splits Yes We follow the standard split of the dataset, 3,053 documents for training, 1,000 for development and, 1,000 for the test. ... CDR (Li et al. 2016). It is a biomedical dataset and consists of 1,500 Pub Med abstracts, which are equally divided into three sets for training, development, and testing. ... We follow Tang et al. (2020) to divide the training set into two parts, 23,353 documents for training and 5,839 for development.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for the experiments. It only mentions the use of pre-trained models (BERT-base, RoBERTa-large, SciBERT-base) and the PyTorch framework.
Software Dependencies No Using Py Torch, we develop our model based on Huggingface s Transformers (Wolf et al. 2020).
Experiment Setup Yes Using Py Torch, we develop our model based on Huggingface s Transformers (Wolf et al. 2020). We use BERT-base (Devlin et al. 2019) or Ro BERTa-large (Liu et al. 2019) as the encoder on Doc RED, and Sci BERT-base (Beltagy, Lo, and Cohan 2019) on CDR and GDA. We employ Adam W (Loshchilov and Hutter 2019) to optimize our model with a linear warmup (Goyal et al. 2017) for the first 6% steps. We empirically set the layer number L of reasoning module to 2. We apply dropout (Srivastava et al. 2014) between layers with rate 0.1, and clip the gradients of model parameters to a maximal norm of 1.0. All hyper-parameters are tuned on the development set.