Enhanced Meta-Learning for Cross-Lingual Named Entity Recognition with Minimal Resources
Authors: Qianhui Wu, Zijia Lin, Guoxin Wang, Hui Chen, Börje F. Karlsson, Biqing Huang, Chin-Yew Lin9274-9281
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on cross-lingual named entity recognition with minimal resources over five target languages. The results show that our approach significantly outperforms existing state-of-the-art methods across the board. |
| Researcher Affiliation | Collaboration | 1Beijing National Research Center for Information Science and Technology (BNRist) Department of Automation, Tsinghua University, Beijing, China 2Microsoft Research, Beijing, China 3School of Software, Tsinghua University, Beijing, China wuqianhui@tsinghua.org.cn, {zijlin, guow, borje.karlsson, cyl}@microsoft.com jichenhui2012@gmail.com, hbq@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Enhanced Meta-Learning for Cross-Lingual NER with Minimal Resources |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | We conduct experiments on four benchmark datasets: Co NLL-2002 Spanish and Dutch NER (Tjong Kim Sang 2002), Co NLL-2003 English and German NER (Tjong Kim Sang and De Meulder 2003), Europeana Newspapers French NER (Neudecker 2016), and MSRA Chinese NER (Cao et al. 2018). |
| Dataset Splits | Yes | All datasets are split into a training set, a development set (testa) and a test set (testb). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | We implement our approach with Py Torch 1.0.1. |
| Experiment Setup | Yes | We use the cased multilingual BERTBASE with 12 Transformer blocks, 768 hidden units, 12 self-attention heads, GELU activations (Hendrycks and Gimpel 2016), a dropout rate of 0.1 and learned positional embeddings. Specifically, for sequence length, we employ a sliding window with a maximum length of 128. ... We select K = 2 similar examples for both pseudo NER task construction and the adaptation phase. The mask ratio is set to 0.2, λ in Equation 13 is set to 2.0, update steps n in Equation 7 is set to 2, the number of sampled pseudo-NER tasks used for one meta-update is set to 32, and the maximum meta-update steps is set to 3 × 10^3. ... for the optimizers of both inner-update and meta-update, we use Adam (Kingma and Ba 2015) with learning rate of α, β = 3e 5, while for gradient updates during adaptation, we set the learning rate γ to 1e-5. |