Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Authors: Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we provide a comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability. We study the impact of linguistic properties of the languages, the architecture of the model, and the learning objectives. The experimental study is done in the context of three typologically different languages Spanish, Hindi, and Russian and using two conceptually different NLP tasks, textual entailment and named entity recognition. |
| Researcher Affiliation | Collaboration | Karthikeyan K Department of Computer Science and Engineering Indian Institute of Technology Kanpur Kanpur, Uttar Pradesh 208016, India kkarthi@cse.iitk.ac.inZihan Wang Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA zihanw2@illinois.eduStephen Mayhew Duolingo Pittsburgh, PA, 15206, USA stephen@duolingo.comDan Roth Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA danroth@seas.upenn.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All our models and implementations can be found on our project page1. 1http://cogcomp.org/page/publication_view/900 |
| Open Datasets | Yes | We use the Cross-lingual Natural Language Inference (XNLI) (Conneau et al., 2018) dataset to evaluate cross-lingual TE performance and LORELEI dataset (Strassel & Tracey, 2016) for Cross-Lingual NER. |
| Dataset Splits | Yes | We subsample 80%, 10%, 10% of English NER data as training, development, and testing. |
| Hardware Specification | No | This work was supported by Contracts W911NF-15-1-0461, HR0011-15-C-0113, and HR0011-18-2-0052 from the US Defense Advanced Research Projects Agency (DARPA), by Google Cloud, and by Cloud TPUs from Googles Tensor Flow Research Cloud (TFRC). |
| Software Dependencies | No | The paper mentions software like "Transformer-based" and "Sentence Piece library" but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Unless otherwise speciļ¬ed, for B-BERT training, we use a batch size of 32, a learning rate of 0.0001, and 2M training steps. |