Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling
Authors: Wenxuan Zhou, Kevin Huang, Tengyu Ma, Jing Huang14612-14620
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on three document-level RE benchmark datasets: Doc RED, a recently released large-scale RE dataset, and two datasets CDR and GDA in the biomedical domain. Our ATLOP (Adaptive Thresholding and Localized c Ontext Pooling) model achieves an F1 score of 63.4, and also significantly outperforms existing models on both CDR and GDA. Experiments on three document-level relation extraction datasets, Doc RED (Yao et al. 2019), CDR (Li et al. 2016), and GDA (Wu et al. 2019b), demonstrate that our ATLOP model significantly outperforms the state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Wenxuan Zhou,1* Kevin Huang,2 Tengyu Ma,3 Jing Huang 2 1Department of Computer Science, University of Southern California, Los Angeles, CA 2JD AI Research, Mountain View, CA 3Department of Computer Science, Stanford University, Stanford, CA |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have released our code at https://github.com/wzhouad/ATLOP. |
| Open Datasets | Yes | We evaluate our ATLOP model on three public document-level relation extraction datasets. The dataset statistics are shown in Table 1. Doc RED (Yao et al. 2019) is a large-scale crowdsourced dataset for document-level RE. CDR (Li et al. 2016) is a human-annotated dataset in the biomedical domain. GDA (Wu et al. 2019b) is a large-scale dataset in the biomedical domain. |
| Dataset Splits | Yes | Table 1: Statistics of the datasets in experiments. Doc RED # Train 3053 # Dev 1000 # Test 1000. CDR # Train 500 # Dev 500 # Test 500. GDA # Train 23353 # Dev 5839 # Test 1000. We follow Christopoulou, Miwa, and Ananiadou (2019) to split the training set into an 80/20 split as training and development sets. |
| Hardware Specification | Yes | All models are trained with 1 Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions software like Huggingface's Transformers and Apex library, and pre-trained models like BERT-base, RoBERTa-large, and SciBERT, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Table 2: Hyper-parameters in training. Batch size 4, 4, 4, 16. # Epoch 30, 30, 30, 10. lr for encoder 5e-5, 3e-5, 2e-5, 2e-5. lr for classifier 1e-4, 1e-4, 1e-4, 1e-4. Our model is optimized with Adam W (Loshchilov and Hutter 2019) using learning rates {2e 5, 3e 5, 5e 5, 1e 4}, with a linear warmup (Goyal et al. 2017) for the ο¬rst 6% steps followed by a linear decay to 0. We apply dropout (Srivastava et al. 2014) between layers with rate 0.1, and clip the gradients of model parameters to a max norm of 1.0. |