Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation

Authors: Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, Shafiq Joty

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method achieves the state of the art in the WMT 14 English-French, WMT 16 German-English and English-Romanian bilingual unsupervised translation tasks, with 40.2, 36.8, and 37.0 BLEU, respectively.
Researcher Affiliation	Collaboration	Meta AI Nanyang Technological University Johns Hopkins University
Pseudocode	Yes	Algorithm 1 Sinkhorn: Given matrix Z RB K, which represents the after-exponential latent representations of batches of samples, and n number of iterations; return the sinkhorn prototype output Q RB K.
Open Source Code	Yes	1Code: https://github.com/nxphi47/fairseq/tree/swav umt
Open Datasets	Yes	For the WMT 14 English-French (En-Fr), WMT 16 English-German (En-De) and WMT 16 English-Romanian (En-Ro) bilingual UMT tasks, we follow the established predecessors (Lample et al., 2018c; Conneau & Lample, 2019; Song et al., 2019; Nguyen et al., 2021) to use only the monolingual data from 2007-2017 WMT News Crawl datasets of the two languages for each task.
Dataset Splits	No	The paper mentions using a 'validation set' for certain metrics (e.g., Global Accuracy) and 'held-out' data for visualizations, but does not specify the full training/validation/test dataset splits with explicit percentages, sample counts, or references to predefined splits for the main UMT tasks.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or types of computing resources used for the experiments.
Software Dependencies	No	The paper mentions software like 'Moses multi-bleu.perl script', 'sacrebleu', and 'sentencepiece tokenizer model', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We set mininum, maximum lengths of Lmin = 5 and Lmax = 300; source/target length ratio µ 1.5; maximum overlap ratio γi = 0.35 and accept only the top ρ = 5% of mined pairs. The agreement BLEU threshold is β = 30