Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling

Authors: Wenxuan Zhou, Kevin Huang, Tengyu Ma, Jing Huang14612-14620

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment on three document-level RE benchmark datasets: Doc RED, a recently released large-scale RE dataset, and two datasets CDR and GDA in the biomedical domain. Our ATLOP (Adaptive Thresholding and Localized c Ontext Pooling) model achieves an F1 score of 63.4, and also significantly outperforms existing models on both CDR and GDA. Experiments on three document-level relation extraction datasets, Doc RED (Yao et al. 2019), CDR (Li et al. 2016), and GDA (Wu et al. 2019b), demonstrate that our ATLOP model significantly outperforms the state-of-the-art methods.
Researcher Affiliation Collaboration Wenxuan Zhou,1* Kevin Huang,2 Tengyu Ma,3 Jing Huang 2 1Department of Computer Science, University of Southern California, Los Angeles, CA 2JD AI Research, Mountain View, CA 3Department of Computer Science, Stanford University, Stanford, CA
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We have released our code at https://github.com/wzhouad/ATLOP.
Open Datasets Yes We evaluate our ATLOP model on three public document-level relation extraction datasets. The dataset statistics are shown in Table 1. Doc RED (Yao et al. 2019) is a large-scale crowdsourced dataset for document-level RE. CDR (Li et al. 2016) is a human-annotated dataset in the biomedical domain. GDA (Wu et al. 2019b) is a large-scale dataset in the biomedical domain.
Dataset Splits Yes Table 1: Statistics of the datasets in experiments. Doc RED # Train 3053 # Dev 1000 # Test 1000. CDR # Train 500 # Dev 500 # Test 500. GDA # Train 23353 # Dev 5839 # Test 1000. We follow Christopoulou, Miwa, and Ananiadou (2019) to split the training set into an 80/20 split as training and development sets.
Hardware Specification Yes All models are trained with 1 Tesla V100 GPU.
Software Dependencies No The paper mentions software like Huggingface's Transformers and Apex library, and pre-trained models like BERT-base, RoBERTa-large, and SciBERT, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Table 2: Hyper-parameters in training. Batch size 4, 4, 4, 16. # Epoch 30, 30, 30, 10. lr for encoder 5e-5, 3e-5, 2e-5, 2e-5. lr for classifier 1e-4, 1e-4, 1e-4, 1e-4. Our model is optimized with Adam W (Loshchilov and Hutter 2019) using learning rates {2e 5, 3e 5, 5e 5, 1e 4}, with a linear warmup (Goyal et al. 2017) for the first 6% steps followed by a linear decay to 0. We apply dropout (Srivastava et al. 2014) between layers with rate 0.1, and clip the gradients of model parameters to a max norm of 1.0.