Span Graph Transformer for Document-Level Named Entity Recognition

Authors: Hongli Mao, Xian-Ling Mao, Hanlin Tang, Yu-Ming Shang, Heyan Huang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both resource-rich nested and flat NER datasets, as well as low-resource distantly supervised NER datasets, demonstrate that proposed SGT model achieves better performance than previous state-of-the-art models.
Researcher Affiliation Academia 1School of Computer Science & Technology, Beijing Institute of Technology, Beijing, China 2 Beijing University of Posts and Telecommunications, Beijing, China
Pseudocode No The paper describes its methods but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., a specific repository link or an explicit statement of code release) for its methodology.
Open Datasets Yes In the resource-rich setting, we perform experiments on three nested NER datasets: ACE 20042, ACE 20053 and Genia (Kim et al. 2003); and two flat NER datasets Co NLL 2003 (Sang and De Meulder 2003) and Onto Notes 54. For the low-resource distantly supervised setting, we use BC5CDR dataset with standard train, dev, and test splits.
Dataset Splits Yes For ACE 2004 and ACE 2005, we follow the same settings of Lu and Roth (2015) and Muis and Lu (2017) to split the data into train, dev and test sets by 8:1:1. For Genia, we use the same document split as suggested as Lu and Roth (2015) and Yu et al. (2020).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning the use of pre-trained models.
Software Dependencies No The paper mentions using RoBERTa-base and biobert-base pre-trained models, but does not specify the version numbers of the underlying software frameworks (e.g., PyTorch, TensorFlow) or other ancillary software.
Experiment Setup Yes For SGT model, the weight of span overlap α is set as 0.9, the length of the expanded context kn is 50. The number of Graph Transformer layer L is 4, filtering threshold θ is set as 0.5. All parameters are optimized using Adam with a peak learning rate of 2e 5. Final reported results are averages over three runs with different random seeds.