Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation

Authors: Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, Shafiq Joty

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves the state of the art in the WMT 14 English-French, WMT 16 German-English and English-Romanian bilingual unsupervised translation tasks, with 40.2, 36.8, and 37.0 BLEU, respectively.
Researcher Affiliation Collaboration Meta AI Nanyang Technological University Johns Hopkins University
Pseudocode Yes Algorithm 1 Sinkhorn: Given matrix Z RB K, which represents the after-exponential latent representations of batches of samples, and n number of iterations; return the sinkhorn prototype output Q RB K.
Open Source Code Yes 1Code: https://github.com/nxphi47/fairseq/tree/swav umt
Open Datasets Yes For the WMT 14 English-French (En-Fr), WMT 16 English-German (En-De) and WMT 16 English-Romanian (En-Ro) bilingual UMT tasks, we follow the established predecessors (Lample et al., 2018c; Conneau & Lample, 2019; Song et al., 2019; Nguyen et al., 2021) to use only the monolingual data from 2007-2017 WMT News Crawl datasets of the two languages for each task.
Dataset Splits No The paper mentions using a 'validation set' for certain metrics (e.g., Global Accuracy) and 'held-out' data for visualizations, but does not specify the full training/validation/test dataset splits with explicit percentages, sample counts, or references to predefined splits for the main UMT tasks.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or types of computing resources used for the experiments.
Software Dependencies No The paper mentions software like 'Moses multi-bleu.perl script', 'sacrebleu', and 'sentencepiece tokenizer model', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We set mininum, maximum lengths of Lmin = 5 and Lmax = 300; source/target length ratio ยต 1.5; maximum overlap ratio ฮณi = 0.35 and accept only the top ฯ = 5% of mined pairs. The agreement BLEU threshold is ฮฒ = 30