A Neural Span-Based Continual Named Entity Recognition Model
Authors: Yunan Zhang, Qingcai Chen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic CL datasets derived from Onto Notes and Few-NERD show that Span KL significantly outperforms previous So TA in many aspects, and obtains the smallest gap from CL to the upper bound revealing its high practiced value. |
| Researcher Affiliation | Academia | Yunan Zhang1, Qingcai Chen1, 2* 1 Harbin Institute of Technology (Shenzhen), Shenzhen, China 2 Peng Cheng Laboratory, Shenzhen, China yunanzhang0@outlook.com, qingcai.chen@hit.edu.cn |
| Pseudocode | No | The paper describes the model architecture and training process in textual paragraphs and uses a diagram (Figure 2), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Qznan/Span KL. |
| Open Datasets | Yes | Onto Notes-5.0 English3 is annotated for 18 entity types, we follow the recent works to select the following types to ensure sufficient samples for training: Organization (ORG), Person (PER), Geo-Political Entity (GPE), Date (DATE), Cardinal (CARD), Nationalities and Religious Political Group (NORP). Each type is assigned to a synthetic CL task. Few-NERD (SUP)4 is hierarchically annotated for 8 coarse-grained and 66 fine-grained entity types. It’s proposed for few-shot research but we adopt the normal supervised full version. We construct each task via each coarse-grained types and thus each task contains its related multiple fine-grained entity types that will be evaluated. |
| Dataset Splits | Yes | To separate the original training/dev set into a series of CL tasks, Monaikul et al. (2021) commonly divides samples randomly into disjoint tasks, while Xia et al. (2022) typically filter samples having the entity types defined to be learn in that task to compose its datasets, which we refer to as Split and Filter, respectively. Given a certain task at each step, we train the models on its training set and report the performance of the following metrics on its test set relying on the best performance of its dev set. |
| Hardware Specification | Yes | We set batch size 32, 24 and maximum epoch 10, 5 on Onto Notes, Few-NERD, respectively, to train on a V100 GPU. |
| Software Dependencies | No | The paper mentions using "bert-large-cased from Hugging Face" and "AdamW optimizer", but does not provide specific version numbers for software libraries like Python, PyTorch, or the Hugging Face Transformers library. |
| Experiment Setup | Yes | We set α = β = 1 for all models. All parameters are fine-tuned by Adam W optimizer (Loshchilov and Hutter 2017), with learning rate (lr) 5e 5 and 1e 3 for bert encoder and the rest networks. The lr is scheduled by warmup at first 200 steps followed by a cosine decay. We limit to 512 max length of sentence... We set batch size 32, 24 and maximum epoch 10, 5 on Onto Notes, Few-NERD, respectively, to train on a V100 GPU. |