reproducibilityindex.ai

Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling

Authors: Wenxuan Zhou, Kevin Huang, Tengyu Ma, Jing Huang14612-14620

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment on three document-level RE benchmark datasets: Doc RED, a recently released large-scale RE dataset, and two datasets CDR and GDA in the biomedical domain. Our ATLOP (Adaptive Thresholding and Localized c Ontext Pooling) model achieves an F1 score of 63.4, and also significantly outperforms existing models on both CDR and GDA. Experiments on three document-level relation extraction datasets, Doc RED (Yao et al. 2019), CDR (Li et al. 2016), and GDA (Wu et al. 2019b), demonstrate that our ATLOP model significantly outperforms the state-of-the-art methods.
Researcher Affiliation	Collaboration	Wenxuan Zhou,1* Kevin Huang,2 Tengyu Ma,3 Jing Huang 2 1Department of Computer Science, University of Southern California, Los Angeles, CA 2JD AI Research, Mountain View, CA 3Department of Computer Science, Stanford University, Stanford, CA
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	We have released our code at https://github.com/wzhouad/ATLOP.
Open Datasets	Yes	We evaluate our ATLOP model on three public document-level relation extraction datasets. The dataset statistics are shown in Table 1. Doc RED (Yao et al. 2019) is a large-scale crowdsourced dataset for document-level RE. CDR (Li et al. 2016) is a human-annotated dataset in the biomedical domain. GDA (Wu et al. 2019b) is a large-scale dataset in the biomedical domain.
Dataset Splits	Yes	Table 1: Statistics of the datasets in experiments. Doc RED # Train 3053 # Dev 1000 # Test 1000. CDR # Train 500 # Dev 500 # Test 500. GDA # Train 23353 # Dev 5839 # Test 1000. We follow Christopoulou, Miwa, and Ananiadou (2019) to split the training set into an 80/20 split as training and development sets.
Hardware Specification	Yes	All models are trained with 1 Tesla V100 GPU.
Software Dependencies	No	The paper mentions software like Huggingface's Transformers and Apex library, and pre-trained models like BERT-base, RoBERTa-large, and SciBERT, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Table 2: Hyper-parameters in training. Batch size 4, 4, 4, 16. # Epoch 30, 30, 30, 10. lr for encoder 5e-5, 3e-5, 2e-5, 2e-5. lr for classifier 1e-4, 1e-4, 1e-4, 1e-4. Our model is optimized with Adam W (Loshchilov and Hutter 2019) using learning rates {2e 5, 3e 5, 5e 5, 1e 4}, with a linear warmup (Goyal et al. 2017) for the ﬁrst 6% steps followed by a linear decay to 0. We apply dropout (Srivastava et al. 2014) between layers with rate 0.1, and clip the gradients of model parameters to a max norm of 1.0.