reproducibilityindex.ai

Machine-Created Universal Language for Cross-Lingual Transfer

Authors: Yaobo Liang, Quanzhi Zhu, Junhe Zhao, Nan Duan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that translating into MUL yields improved performance compared to multilingual pre-training, and our analysis indicates that MUL possesses strong interpretability. We conduct experiments on XNLI, NER, MLQA, and Tatoeba using MUL as input.
Researcher Affiliation	Industry	Microsoft Research Asia {yaobo.liang, v-quanzhizhu, v-junhezhao, nanduan}@microsoft.com
Pseudocode	No	The paper describes the steps of its method but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is at: https://github.com/microsoft/Unicoder/tree/master/MCUL.
Open Datasets	Yes	In the first stage, we pre-train the encoder with a multilingual MLM objective on 15 languages of XNLI. The pre-training corpus is CC-Net (Wenzek et al. 2020). In the second stage, we train our model on bilingual data OPUS-100 (Zhang et al. 2020).
Dataset Splits	No	The paper mentions using 'English training data' for some tasks and pre-training/fine-tuning, but it does not explicitly provide details about train/validation/test splits (e.g., percentages or sample counts for each split) for the datasets used.
Hardware Specification	No	The paper mentions 'Limited by resources' and 'GPU memory usage' but does not specify any particular GPU or CPU models, or other hardware components used for experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA 11.x).
Experiment Setup	Yes	Limited by resources, we pre-train the model for 500K steps with a batch size of 8192, which is less than XLM-R Base. The hyper-parameters in pre-training and finetuning are the same as those of natural language. The size of the universal vocabulary K is set to 60K.