reproducibilityindex.ai

On Learning Universal Representations Across Languages

Authors: Xiangpeng Wei, Rongxiang Weng, Yue Hu, Luxi Xing, Heng Yu, Weihua Luo

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct evaluations on two challenging cross-lingual tasks, XTREME and machine translation. Experimental results show that the HICTL outperforms the state-of-the-art XLM-R by an absolute gain of 4.2% accuracy on the XTREME benchmark as well as achieves substantial improvements on both of the highresource and low-resource English X translation tasks over strong baselines.
Researcher Affiliation	Collaboration	1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China {weixiangpeng,huyue,xingluxi}@iie.ac.cn 3Machine Intelligence Technology Lab, Alibaba Group, Hangzhou, China {wengrx,yuheng.yh,weihua.luowh}@alibaba-inc.com
Pseudocode	No	No structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm 1') were found in the paper. The method is described using mathematical notation and prose.
Open Source Code	No	The paper mentions an 'official submission to XTREME (https://sites.research.google/xtreme)' but does not explicitly state that the source code for their methodology is provided or linked.
Open Datasets	Yes	During pre-training, we follow Conneau et al. (2020) to build a Common-Crawl Corpus using the CCNet (Wenzek et al., 2019) tool1 for monolingual texts. Table 7 (see appendix A) reports the language codes and data size in our work. For parallel data, we use the same (English-to-X) MT dataset as (Conneau & Lample, 2019), which are collected from Multi UN (Eisele & Yu, 2010) for French, Spanish, Arabic and Chinese, the IIT Bombay corpus (Kunchukuttan et al., 2018a) for Hindi, the Open Subtitles 2018 for Turkish, Vietnamese and Thai, the EUbookshop corpus for German, Greek and Bulgarian, Tanzil for both Urdu and Swahili, and Global Voices for Swahili. Table 8 (see appendix A) shows the statistics of the parallel data.
Dataset Splits	Yes	We concatenate newstest 2012 and newstest 2013 as the validation set and use newstest 2014 as the test set. ... We split 7k sentence pairs from the training dataset for validation and concatenate dev2010, dev2012, tst2010, tst2011, tst2012 as the test set.
Hardware Specification	Yes	We run the pre-training experiments on 8 V100 GPUs, batch size 1024.
Software Dependencies	No	The paper mentions tools like 'multi-bleu.perl' and 'sacre BLEU' and refers to using the 'sentence-piece model with XLM-R', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Hyperparameters for pre-training and ﬁne-tuning are shown in Table 9 (see appendix B). We run the pre-training experiments on 8 V100 GPUs, batch size 1024. The number of negative samples m=512 for word-level contrastive learning.