PUnifiedNER: A Prompting-Based Unified NER System for Diverse Datasets

Authors: Jinghui Lu, Rui Zhao, Brian Mac Namee, Fei Tan

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that PUnified NER leads to significant prediction benefits compared to dataset-specific models with impressively reduced model deployment costs. We also perform comprehensive pilot and ablation studies to support in-depth analysis of each component in PUnified NER.
Researcher Affiliation Collaboration Jinghui Lu1, Rui Zhao1, Brian Mac Namee2,3, Fei Tan1* 1Sense Time Research 2The Insight Centre for Data Analytics, University College Dublin 3School of Computer Science, University College Dublin
Pseudocode No The paper describes the task reframing using formal equations and examples (e.g., 'xinput = [se, sp1, ..., se, spn, st, x] (1)'), but it does not include a distinct pseudocode block or algorithm section.
Open Source Code Yes Our code and a demo interface for PUnified NER have been made available,5 which demonstrates the capability of ondemand entity recognition. 5All resources are available at: https://github.com/GeorgeLuImmortal/PUnifiedNER.
Open Datasets Yes We train and evaluate PUnified NER on eight existing public NER datasets that target various entity types from different domains including social media, e-commerce, news, postal address, etc. We use the Ecommerce (Ding et al. 2019), MSRA (Levow 2006), Onto Notes 4.0 (Pradhan et al. 2013), People Daily 2014, Boson,2 Resume (Zhang and Yang 2018), CCKS2021,3 and CLUENER (Xu et al. 2020) datasets. 2People Daily 2014 and Boson datasets are available at https://github.com/hspuppy/hugbert/tree/master/ner_dataset. 3https://tianchi.aliyun.com/competition/entrance/531900/information
Dataset Splits Yes The training/validation split is the same as the original setting. We select the best performing model on a validation set and report its test f-score.
Hardware Specification No The paper discusses model parameters (e.g., 'BERT-Base (110 million parameters)', 'BERT-Large (340 million parameters)', 'T5-base parameters, i.e., 220 million') in the context of comparing model sizes for deployment costs. However, it does not specify the actual hardware (e.g., GPU models, CPU types, or memory) used to run their experiments.
Software Dependencies No The paper mentions using T5 and a specific pre-trained T5 checkpoint ('T5-v1.1-base-chinese checkpoint pretrained by UER'), but it does not provide version numbers for any other software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, etc.) that would be necessary for reproducibility.
Experiment Setup Yes Other hyperparameters settings are detailed in the Appendix. We also use the sampling strategy discussed in the previous section to construct training batches and continuously pretrain T5 up to 50K steps, evaluating every 1K steps.