Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation
Authors: Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, Shafiq Joty
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method achieves the state of the art in the WMT 14 English-French, WMT 16 German-English and English-Romanian bilingual unsupervised translation tasks, with 40.2, 36.8, and 37.0 BLEU, respectively. |
| Researcher Affiliation | Collaboration | Meta AI Nanyang Technological University Johns Hopkins University |
| Pseudocode | Yes | Algorithm 1 Sinkhorn: Given matrix Z RB K, which represents the after-exponential latent representations of batches of samples, and n number of iterations; return the sinkhorn prototype output Q RB K. |
| Open Source Code | Yes | 1Code: https://github.com/nxphi47/fairseq/tree/swav umt |
| Open Datasets | Yes | For the WMT 14 English-French (En-Fr), WMT 16 English-German (En-De) and WMT 16 English-Romanian (En-Ro) bilingual UMT tasks, we follow the established predecessors (Lample et al., 2018c; Conneau & Lample, 2019; Song et al., 2019; Nguyen et al., 2021) to use only the monolingual data from 2007-2017 WMT News Crawl datasets of the two languages for each task. |
| Dataset Splits | No | The paper mentions using a 'validation set' for certain metrics (e.g., Global Accuracy) and 'held-out' data for visualizations, but does not specify the full training/validation/test dataset splits with explicit percentages, sample counts, or references to predefined splits for the main UMT tasks. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or types of computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions software like 'Moses multi-bleu.perl script', 'sacrebleu', and 'sentencepiece tokenizer model', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We set mininum, maximum lengths of Lmin = 5 and Lmax = 300; source/target length ratio ยต 1.5; maximum overlap ratio ฮณi = 0.35 and accept only the top ฯ = 5% of mined pairs. The agreement BLEU threshold is ฮฒ = 30 |