Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Authors: Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we provide a comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability. We study the impact of linguistic properties of the languages, the architecture of the model, and the learning objectives. The experimental study is done in the context of three typologically different languages Spanish, Hindi, and Russian and using two conceptually different NLP tasks, textual entailment and named entity recognition.
Researcher Affiliation Collaboration Karthikeyan K Department of Computer Science and Engineering Indian Institute of Technology Kanpur Kanpur, Uttar Pradesh 208016, India kkarthi@cse.iitk.ac.inZihan Wang Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA zihanw2@illinois.eduStephen Mayhew Duolingo Pittsburgh, PA, 15206, USA stephen@duolingo.comDan Roth Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA danroth@seas.upenn.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes All our models and implementations can be found on our project page1. 1http://cogcomp.org/page/publication_view/900
Open Datasets Yes We use the Cross-lingual Natural Language Inference (XNLI) (Conneau et al., 2018) dataset to evaluate cross-lingual TE performance and LORELEI dataset (Strassel & Tracey, 2016) for Cross-Lingual NER.
Dataset Splits Yes We subsample 80%, 10%, 10% of English NER data as training, development, and testing.
Hardware Specification No This work was supported by Contracts W911NF-15-1-0461, HR0011-15-C-0113, and HR0011-18-2-0052 from the US Defense Advanced Research Projects Agency (DARPA), by Google Cloud, and by Cloud TPUs from Googles Tensor Flow Research Cloud (TFRC).
Software Dependencies No The paper mentions software like "Transformer-based" and "Sentence Piece library" but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Unless otherwise specified, for B-BERT training, we use a batch size of 32, a learning rate of 0.0001, and 2M training steps.