reproducibilityindex.ai

Enhanced Meta-Learning for Cross-Lingual Named Entity Recognition with Minimal Resources

Authors: Qianhui Wu, Zijia Lin, Guoxin Wang, Hui Chen, Börje F. Karlsson, Biqing Huang, Chin-Yew Lin9274-9281

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on cross-lingual named entity recognition with minimal resources over ﬁve target languages. The results show that our approach signiﬁcantly outperforms existing state-of-the-art methods across the board.
Researcher Affiliation	Collaboration	1Beijing National Research Center for Information Science and Technology (BNRist) Department of Automation, Tsinghua University, Beijing, China 2Microsoft Research, Beijing, China 3School of Software, Tsinghua University, Beijing, China wuqianhui@tsinghua.org.cn, {zijlin, guow, borje.karlsson, cyl}@microsoft.com jichenhui2012@gmail.com, hbq@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1 Enhanced Meta-Learning for Cross-Lingual NER with Minimal Resources
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets	Yes	We conduct experiments on four benchmark datasets: Co NLL-2002 Spanish and Dutch NER (Tjong Kim Sang 2002), Co NLL-2003 English and German NER (Tjong Kim Sang and De Meulder 2003), Europeana Newspapers French NER (Neudecker 2016), and MSRA Chinese NER (Cao et al. 2018).
Dataset Splits	Yes	All datasets are split into a training set, a development set (testa) and a test set (testb).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	Yes	We implement our approach with Py Torch 1.0.1.
Experiment Setup	Yes	We use the cased multilingual BERTBASE with 12 Transformer blocks, 768 hidden units, 12 self-attention heads, GELU activations (Hendrycks and Gimpel 2016), a dropout rate of 0.1 and learned positional embeddings. Speciﬁcally, for sequence length, we employ a sliding window with a maximum length of 128. ... We select K = 2 similar examples for both pseudo NER task construction and the adaptation phase. The mask ratio is set to 0.2, λ in Equation 13 is set to 2.0, update steps n in Equation 7 is set to 2, the number of sampled pseudo-NER tasks used for one meta-update is set to 32, and the maximum meta-update steps is set to 3 × 10^3. ... for the optimizers of both inner-update and meta-update, we use Adam (Kingma and Ba 2015) with learning rate of α, β = 3e 5, while for gradient updates during adaptation, we set the learning rate γ to 1e-5.