reproducibilityindex.ai

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Authors: Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we provide a comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability. We study the impact of linguistic properties of the languages, the architecture of the model, and the learning objectives. The experimental study is done in the context of three typologically different languages Spanish, Hindi, and Russian and using two conceptually different NLP tasks, textual entailment and named entity recognition.
Researcher Affiliation	Collaboration	Karthikeyan K Department of Computer Science and Engineering Indian Institute of Technology Kanpur Kanpur, Uttar Pradesh 208016, India kkarthi@cse.iitk.ac.inZihan Wang Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA zihanw2@illinois.eduStephen Mayhew Duolingo Pittsburgh, PA, 15206, USA stephen@duolingo.comDan Roth Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA danroth@seas.upenn.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All our models and implementations can be found on our project page1. 1http://cogcomp.org/page/publication_view/900
Open Datasets	Yes	We use the Cross-lingual Natural Language Inference (XNLI) (Conneau et al., 2018) dataset to evaluate cross-lingual TE performance and LORELEI dataset (Strassel & Tracey, 2016) for Cross-Lingual NER.
Dataset Splits	Yes	We subsample 80%, 10%, 10% of English NER data as training, development, and testing.
Hardware Specification	No	This work was supported by Contracts W911NF-15-1-0461, HR0011-15-C-0113, and HR0011-18-2-0052 from the US Defense Advanced Research Projects Agency (DARPA), by Google Cloud, and by Cloud TPUs from Googles Tensor Flow Research Cloud (TFRC).
Software Dependencies	No	The paper mentions software like "Transformer-based" and "Sentence Piece library" but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Unless otherwise speciﬁed, for B-BERT training, we use a batch size of 32, a learning rate of 0.0001, and 2M training steps.