Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cross-model Back-translated Distillation for Unsupervised Machine Translation

Authors: Xuan-Phi Nguyen, Shafiq Joty, Thanh-Tung Nguyen, Kui Wu, Ai Ti Aw

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, CBD achieves the state of the art in the WMT 14 English-French, WMT 16 English-German and English-Romanian bilingual unsupervised translation tasks, with BLEU scores of 38.2, 30.1, and 36.3, respectively.
Researcher Affiliation Collaboration 1Nanyang Technological University 2Institute for Infocomm Research (I2R), A*STAR 3Salesforce Research Asia. Correspondence to: Xuan-Phi Nguyen <EMAIL>.
Pseudocode Yes Algorithm 1 describes the overall CBD training process, where the ordered pair (θα, θβ) is alternated between (θ1, θ2) and (θ2, θ1)
Open Source Code Yes Code: https://github.com/nxphi47/multiagent_crosstranslate.
Open Datasets Yes Specifically, we use all of the monolingual data from 2007-2017 WMT News Crawl datasets, which yield 190M, 78M, 309M and 3M sentences for language English (En), French (Fr), German (De) and Romanian (Ro), respectively. ... The IWSLT 13 En-Fr dataset contains 200K sentences for each language. ... The IWSLT 14 En-De dataset contains 160K sentences for each language.
Dataset Splits Yes We use the IWSLT15.TED.tst2012 set for validation and the IWSLT15.TED.tst2013 set for testing. ... We split it into 95% for training and 5% for validation, and we use IWSLT14.TED.{dev2010, dev2012, tst2010,tst1011, tst2012} for testing.
Hardware Specification Yes We train the model with a 2K tokens per batch on a 8-GPU system. ... We use a 4-GPU system to train the models. ... trained using only 1 GPU.
Software Dependencies No The paper mentions software like 'Moses multi-bleu.perl script', 'XLM', 'MASS', 'Transformer', 'Ken LM', 'Byte-Pair Encoding', and 'fast Text' but does not specify their version numbers for reproducibility.
Experiment Setup Yes We train the model with a 2K tokens per batch on a 8-GPU system. ... Transformers with 6 layers and 1024 model dimensions. ... We follow Lample et al. (2018c) to train the UMT agents with a parameter-shared Transformer (Vaswani et al., 2017) that has 6 layers and 512 dimensions and a batch size of 32 sentences.