Enhanced Meta-Learning for Cross-Lingual Named Entity Recognition with Minimal Resources

Authors: Qianhui Wu, Zijia Lin, Guoxin Wang, Hui Chen, Börje F. Karlsson, Biqing Huang, Chin-Yew Lin9274-9281

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on cross-lingual named entity recognition with minimal resources over five target languages. The results show that our approach significantly outperforms existing state-of-the-art methods across the board.
Researcher Affiliation Collaboration 1Beijing National Research Center for Information Science and Technology (BNRist) Department of Automation, Tsinghua University, Beijing, China 2Microsoft Research, Beijing, China 3School of Software, Tsinghua University, Beijing, China wuqianhui@tsinghua.org.cn, {zijlin, guow, borje.karlsson, cyl}@microsoft.com jichenhui2012@gmail.com, hbq@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Enhanced Meta-Learning for Cross-Lingual NER with Minimal Resources
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets Yes We conduct experiments on four benchmark datasets: Co NLL-2002 Spanish and Dutch NER (Tjong Kim Sang 2002), Co NLL-2003 English and German NER (Tjong Kim Sang and De Meulder 2003), Europeana Newspapers French NER (Neudecker 2016), and MSRA Chinese NER (Cao et al. 2018).
Dataset Splits Yes All datasets are split into a training set, a development set (testa) and a test set (testb).
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies Yes We implement our approach with Py Torch 1.0.1.
Experiment Setup Yes We use the cased multilingual BERTBASE with 12 Transformer blocks, 768 hidden units, 12 self-attention heads, GELU activations (Hendrycks and Gimpel 2016), a dropout rate of 0.1 and learned positional embeddings. Specifically, for sequence length, we employ a sliding window with a maximum length of 128. ... We select K = 2 similar examples for both pseudo NER task construction and the adaptation phase. The mask ratio is set to 0.2, λ in Equation 13 is set to 2.0, update steps n in Equation 7 is set to 2, the number of sampled pseudo-NER tasks used for one meta-update is set to 32, and the maximum meta-update steps is set to 3 × 10^3. ... for the optimizers of both inner-update and meta-update, we use Adam (Kingma and Ba 2015) with learning rate of α, β = 3e 5, while for gradient updates during adaptation, we set the learning rate γ to 1e-5.