Continual Learning for Named Entity Recognition
Authors: Natawut Monaikul, Giuseppe Castellucci, Simone Filice, Oleg Rokhlenko13570-13577
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that this approach allows the student model to progressively learn to identify new entity types without forgetting the previously learned ones. We also present a comparison with multiple strong baselines to demonstrate that our approach is superior for continually updating an NER model. |
| Researcher Affiliation | Collaboration | Natawut Monaikul ,1 Giuseppe Castellucci,2 Simone Filice,2 Oleg Rokhlenko2 1University of Illinois at Chicago, Chicago, IL, USA 2Amazon, Seattle, WA, USA |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about the release of its own source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | To evaluate our approach, we used two well-known NER datasets: Co NLL-03 English NER (Tjong Kim Sang and De Meulder 2003) and Onto Notes (Hovy et al. 2006). |
| Dataset Splits | Yes | We divided the official training and validation sets of Co NLL-03 and Onto Notes into four and six disjoint subsets, D1, D2, . . ., respectively: each Di is annotated only for the entity type ei. We first train an initial model M1 on D1 for e1. This model becomes the teacher for e1 with which we train a student model M2 on the second slice D2, which is labeled for e2 only: M2 thus learns to tag both e1 and e2. We repeat this process for each slice Di, i.e., training a new student on a new slice using the previous trained model as the teacher for the previously learned labels. At each step i, we use the i-th slice of the validation set for early stopping and evaluate the resulting model Mi on the official test set annotated for the entity types {e1, ..., ei}. |
| Hardware Specification | Yes | training was performed on a single Nvidia V100 GPU. |
| Software Dependencies | No | The paper mentions 'Pytorch (Paszke et al. 2017)' and 'BERT Huggingface implementation (Wolf et al. 2019)' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | After initial experimentation with different hyperparameters, we chose to train the models with a batch size of 32, a max sentence length of 50 tokens, and a learning rate of 5e-5 for 20 epochs with early stopping (patience=3). For all student models, a temperature Tm = 2 was used, and α = β = 1 for the weighted sum of the losses. |