reproducibilityindex.ai

Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Translation Model

Authors: Xuan-Phi Nguyen, Shafiq Joty, Kui Wu, Ai Ti Aw

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, our method establishes the state of the art in fully unsupervised translation tasks of English (En) to Nepali (Ne), Sinhala (Si), Gujarati (Gu), Latvian (Lv), Estonian and Kazakh (Kk) with BLEU scores of 9.0, 9.5, 17.5, 18.5, 21.0 and 10.0 respectively, and vice versa. This is up to 4.5 BLEU improvement from the previous state of the art [26]. We also show that the method outperforms other related alternatives that attempt to achieve language separation in various low-resource unsupervised tasks. Plus, our ablation analyses demonstrate the importance of different stages of our method, especially the English separation stage (stage 2).
Researcher Affiliation	Collaboration	Xuan-Phi Nguyen1,3, Shafiq Joty1,2, Wu Kui3, Ai Ti Aw3 1Nanyang Technological University 2Salesforce Research 3Institute for Infocomm Research (I2R), A*STAR Singapore
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our codebase is available at github.com/nxphi47/refine_unsup_multilingual_mt.
Open Datasets	Yes	We evaluate our method on the FLo Res [10] low-resource unsupervised translation tasks... We use monolingual corpora from the relevant languages of the Common Crawl dataset, which contains data from totally 25 languages [17].
Dataset Splits	No	The paper mentions evaluating on the FLo Res tasks, which implies use of a test set, but it does not specify explicit training/validation/test dataset splits for their own experimental setup or the overall data used for training.
Hardware Specification	No	The paper states 'For each group of N languages, we use N GPUs to train the model' but does not specify the type or model of GPUs or any other hardware components like CPUs or memory.
Software Dependencies	No	The paper mentions 'multi-bleu.perl' for evaluation but does not provide specific version numbers for any other software dependencies (e.g., Python, PyTorch, or specific libraries).
Experiment Setup	Yes	Our model is a 12-layer Transformer, whose decoder s FFN layers are replaced with our language-specific sharded FFN layers with σ = 3 (see 3.1) during the first 3 stages. For each group of N languages, we use N GPUs to train the model and shard each language-specific FFN layer to a single GPU... We use a batch size of 1024 tokens with a gradient accumulation factor of 2 [20]... In each stage, we finetune the models for 20K updates.