Span-based Unified Named Entity Recognition Framework via Contrastive Learning

Authors: Hongli Mao, Xian-Ling Mao, Hanlin Tang, Yu-Ming Shang, Xiaoyan Gao, Ao-Jie Ma, Heyan Huang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both supervised and zero/few-shot settings demonstrate that proposed SUNER model achieves better performance and higher efficiency than previous state-of-the-art unified NER models.
Researcher Affiliation Academia 1 School of Computer Science & Technology, Beijing Institute of Technology, Beijing, China 2 Beijing University of Posts and Telecommunications, Beijing, China 3 Beijing University of Technology, Beijing, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes We train and evaluate our model on seven existing public NER benchmarks including diverse domains such as news, biomedicine, movie, and restaurant, etc. The used datasets include three nested NER datasets: ACE 20041, ACE 20052 and GENIA [Kim et al., 2003]; along with four flat NER datasets: Co NLL 2003 [Sang and De Meulder, 2003], Onto Notes 53, MIT Restaurant and MIT Movie [Liu and Lane, 2017].
Dataset Splits Yes We use MIT Restaurant and MIT Movie datasets with standard train, dev, and test splits, while adopting the splits of Yu et al. [2020] for the remaining datasets.
Hardware Specification Yes All experiments are conducted on a single GeForce RTX 3090 with the same setting.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In the span detection module, we set the auxiliary loss weight λ as 0.7, the biaffine encoder hidden size is 300. The filtering thresholds θ1 and θ2 are set to 0.5 and 0.4, respectively. During training, all parameters are optimized using Adam with a peak learning rate of 1.5e-5, while Hyper-parameter tuning is performed based on validation set.