Zero-Resource Cross-Lingual Named Entity Recognition
Authors: M Saiful Bari, Shafiq Joty, Prathyusha Jwalapuram7415-7423
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on five different languages demonstrate the effectiveness of our approach, outperforming existing models by a good margin and setting a new SOTA for each language pair. |
| Researcher Affiliation | Collaboration | Nanyang Technological University, Singapore Salesforce Research Asia, Singapore {bari0001, jwal0001}@e.ntu.edu.sg, srjoty@ntu.edu.sg |
| Pseudocode | Yes | Algorithm 1 provides the pseudocode of our training method. |
| Open Source Code | Yes | We have released our code for research purposes.2 (Footnote 2: https://github.com/ntunlp/Zero-Shot-Cross-Lingual-NER) |
| Open Datasets | Yes | The data for English is from the Co NLL-2003 shared task for NER (Sang and Meulder 2003), while the data for Spanish and Dutch is from the Co NLL2002 shared task for NER (Sang 2002). We collected the Finnish NER dataset from (Ruokolainen et al. 2019)4 and refactored a few tags. For Arabic, we use AQMAR Arabic Wikipedia Named Entity Corpus (Mohit et al. 2012). |
| Dataset Splits | Yes | Table 1: Training, Test and Development splits for different datasets. We exclude document start tags (DOCSTART). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Fast Text embeddings and building on the architecture of Lample et al. (2016), but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, or specific library versions). |
| Experiment Setup | Yes | We only use sentences with a maximum length of 250 words for training on the source language data. We use Fast Text embeddings (Grave et al. 2018)... and SGD with a gradient clipping of 5.0 to train the model. The initial learning rate of lr0 = 0.1 and decay = 0.01 worked well with a dropout rate of 0.5. We trained the model for 30 epochs while using a batch size of 16, and evaluated the model after every 150 batches. The sizes of the character embeddings and char-LSTM hidden states were set to 25. Our word LSTM s hidden size was set to 100. The details of the hyperparameters are given in our Github repository.7 |