Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Document-level Relation Extraction as Semantic Segmentation

Authors: Ningyu Zhang, Xiang Chen, Xin Xie, Shumin Deng, Chuanqi Tan, Mosha Chen, Fei Huang, Luo Si, Huajun Chen

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets Doc RED, CDR, and GDA1. 4 Experiments
Researcher Affiliation	Collaboration	Ningyu Zhang1,2 , Xiang Chen 1,2 , Xin Xie1,2 , Shumin Deng1,2 , Chuanqi Tan3 , Mosha Chen3 , Fei Huang3 , Luo Si3 , Huajun Chen1,2 1 Zhejiang University & AZFT Joint Lab for Knowledge Engine 2 Hangzhou Innovation Center, Zhejiang University 3 Alibaba Group EMAIL EMAIL
Pseudocode	No	The paper describes the methodology in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1The code and datasets are available in https://github.com/zjunlp/ Docu Net.
Open Datasets	Yes	We evaluated our Docu Net model on three document-level RE datasets. ... Doc RED [Yao et al., 2019] is a large-scale documentlevel relation extraction dataset by crowdsourcing. ... CDR [Li et al., 2016] is a relation extraction dataset in the biomedical domain... GDA [Wu et al., 2019] is a dataset in the biomedical domain...
Dataset Splits	Yes	Doc RED contains 3,053/1,000/1,000 instances for training, validating and test, respectively. We listed the dataset statistics in Table 1.
Hardware Specification	Yes	We trained on one NVIDIA V100 16GB GPU and evaluated our model with Ign F1, and F1 following [Yao et al., 2019].
Software Dependencies	No	Our model was implemented based on Pytorch. We used cased BERT-base, or Ro BERTa-large as the encoder on Doc RED and Sci BERT-base [Beltagy et al., 2019] on CDR and GDA. We optimize our model with Adam W using learning rates 2e 5 with a linear warmup for the ﬁrst 6% of steps.
Experiment Setup	Yes	We optimize our model with Adam W using learning rates 2e 5 with a linear warmup for the ﬁrst 6% of steps. We set the matrix size N = 42. The context-based strategy is utilized by default.