Transfer Learning for Sequence Labeling Using Source Model and Target Data

Authors: Lingzhen Chen, Alessandro Moschitti6260-6267

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on Named Entity Recognition show that (i) the learned knowledge in the source model can be effectively transferred when the target data contains new categories and (ii) our neural adapter further improves such transfer. We analyze the performance of both our methods by testing the transfer of different categories. Our main contribution is therefore twofold: firstly, we show that our pre-training and fine-tuning method can utilize well the learned knowledge for the target labeling task while being able to learn new knowledge. Secondly, we show that our proposed neural adapter has the ability to mitigate the forgetting of previously learned knowledge, to combat the annotation disagreement, and to further improve the transferred model performance.
Researcher Affiliation Collaboration Lingzhen Chen University of Trento Povo, Italy lingzhen.chen@unitn.it Alessandro Moschitti* Amazon Manhattan Beach, CA, USA amosch@amazon.com
Pseudocode Yes Algorithm 1 Source Model Training, Algorithm 2 Parameter Transfer, Algorithm 3 Target Model Training
Open Source Code Yes We made our source code and the exact partition of our dataset available for further research.1 1https://github.com/liah-chan/transfer NER
Open Datasets Yes We primarily used CONLL 2003 NER 2 dataset for our experiments. [...] 2https://www.clips.uantwerpen.be/conll2003/ner/ [...] we also carry out experiments on ICAB (Italian Content Annotation Bank).3 [...] 3http://ontotext.fbk.eu/icab.html
Dataset Splits Yes For the purpose of our experiment, we divide the CONLL train set in 80%/20% as DS and DT, for the initial and subsequent steps of the experiments. [...] A summary of label statistics of these two datasets is shown in Table 1. Table 1: Number of entities in CONLL dataset (in English) and I-CAB dataset (in Italian). Valid (DS / DT)
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper mentions 'Tensor Flow' but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes We use 100 dimension GLOVE pretrained embedding for English 4 and Italian 5 to initialize the weights of the embedding layer. [...] The character embedding lookup table is randomly initialized with embedding size of 25. The hidden size of the character-level BLSTM is 25 while the word level one is 128. We apply a dropout regularization on the word embeddings with a rate of 0.5. All models are implemented in Tensor Flow (Abadi et al. 2015), as an extension of Neuro NER (Dernoncourt, Lee, and Szolovits 2017). We use Adam (Kingma and Ba 2014) optimizer with a learning rate of 0.001, gradient clipping of 50.0 to minimize the categorical cross entropy, and a maximum epoch number of 100 at each step.