Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cross-Lingual Bridges with Models of Lexical Borrowing

Authors: Yulia Tsvetkov, Chris Dyer

JAIR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our framework obtains substantial improvements (up to 1.6 BLEU) over standard baselines. We conduct translation experiments with three resource-poor setups: Swahili English pivoting via Arabic, Maltese English pivoting via Italic, and Romanian English2 pivoting via French. In intrinsic evaluation, Arabic Swahili, Italian Maltese, and French Romanian borrowing models signiﬁcantly outperform transliteration and cognate discovery models ( 5.1). We then provide a systematic quantitative and qualitative analysis of contribution of integrated translations, relative to baselines and oracles, and on corpora of varying sizes ( 5.2). The proposed pivoting approach yields substantial improvements (up to +1.6 BLEU) in Swahili Arabic English translation, moderate improvement (up to +0.8 BLEU) in Maltese Italian English translation, and small (+0.2 BLEU) but statistically signiﬁcant improvements in Romanian French English.
Researcher Affiliation	Academia	Yulia Tsvetkov EMAIL Chris Dyer EMAIL Language Technologies Institute Carnegie Mellon University Pittsburgh, PA, 15213, USA
Pseudocode	No	The paper describes the model conceptually and its implementation using transducers (e.g., "Our model is conceptually divided into three main parts:", "The model is implemented as a cascade of ﬁnite-state transducers."), but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using a third-party tool 'pyfst' and provides its GitHub link: "We use pyfst a Python interface to Open Fst (Allauzen, Riley, Schalkwyk, Skut, & Mohri, 2007) for the borrowing model implementation.18" and footnote 18 "https://github.com/vchahun/pyfst". However, it does not state that the authors are releasing their own code for the methodology described in this paper.
Open Datasets	Yes	We employ Arabic English and Swahili English bitexts to extract a training set (corpora of sizes 5.4M and 14K sentence pairs, respectively), using a cognate discovery technique (Kondrak, 2001). ... For the Maltese English language pair, we sample a parallel corpus of the same size from the EUbookshop corpus from the OPUS collection (Tiedemann, 2012). Similarly, to simulate resource-poor scenario for the Romanian English language pair, we sample a corpus from the transcribed TED talks (Cettolo, Girardi, & Federico, 2012). ... For Swahili, we construct a pronunciation dictionary based on the Omniglot grapheme-to-IPA mapping.6 ... In Arabic, we use the CMU Arabic vowelized pronunciation dictionary containing about 700K types which has the average of four pronunciations per unvowelized input word type (Metze, Hsiao, Jin, Nallasamy, & Schultz, 2010). ... We use the word2vec Skip-gram model (Mikolov, Sutskever, Chen, Corrado, G. S., & Dean, J., 2013) to train monolingual vectors,12 and the CCA-based tool (Faruqui & Dyer, 2014) for projecting word vectors.13
Dataset Splits	Yes	From the resulting dataset of 490 extracted Arabic Swahili borrowing examples, we set aside randomly sampled 73 examples (15%) for evaluation, and use the remaining 417 examples for model parameter optimization. For Italian Maltese language pair, we use the same technique and extract 425 training and 75 (15%) randomly sampled test examples. For French Romanian language pair, we use an existing small annotated set of borrowing examples, with 282 training and 50 (15%) randomly sampled test examples. ... To evaluate translation improvement on corpora of diﬀerent sizes we conduct experiments with sub-sampled 4,000, 8,000, and 14,000 parallel sentences from the training corpora ... Statistics of the held-out dev and test sets used in all translation experiments are given in Table 9.
Hardware Specification	No	Computational resources were provided by Google in the form of a Google Cloud Computing grant and the NSF through the XSEDE program TG-CCR110017. This statement indicates the source of computational resources but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies	No	In all the MT experiments, we use the cdec22 translation toolkit (Dyer, Lopez, Ganitkevitch, Weese, Ture, Blunsom, Setiawan, Eidelman, & Resnik, 2010), and optimize parameters with MERT (Och, 2003). English 4-gram language models with Kneser-Ney smoothing (Kneser & Ney, 1995) were trained using Ken LM (Heaﬁeld, 2011). ... We use pyfst a Python interface to Open Fst (Allauzen, Riley, Schalkwyk, Skut, & Mohri, 2007) for the borrowing model implementation.18 ... We use the word2vec Skip-gram model (Mikolov, Sutskever, Chen, Corrado, & Dean, J., 2013) to train monolingual vectors,12 and the CCA-based tool (Faruqui & Dyer, C., 2014) for projecting word vectors.13. None of these software mentions include specific version numbers for the tools used.
Experiment Setup	No	For parameter estimation, we employ the Nelder Mead algorithm (Nelder & Mead, 1965), a heuristic derivative-free method that iteratively optimizes, based on an objective function evaluation, the convex hull of n + 1 simplex vertices.8 The objective function used in this work is the soft accuracy of the development set... We set n and k to 5; we did not experiment with other values. ... We then re-score pronunciations of the donor and loanword candidates using the LMs. We capture this intuition in three features: f1 = pφ(donor), f2 = pφ(loanword), and the harmonic mean between the two scores f3 = 2f1f2 / (f1+f2). ... We ﬁrst train, using large monolingual corpora, 100-dimensional word vector representations for donor and recipient language vocabularies. ... Then, we employ canonical correlation analysis (CCA) with small donor loanword dictionaries (training sets in the borrowing models) to project the word embeddings into 50-dimensional vectors with maximized correlation between their dimensions. While some specific settings like 'n and k to 5' or '100-dimensional word vector representations' are given, the paper lacks comprehensive details such as specific learning rates, batch sizes, number of epochs for MT training, or explicit parameters for the Nelder-Mead algorithm, which are necessary for full reproducibility.