Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Empirical Regularization for Synthetic Sentence Pairs in Unsupervised Neural Machine Translation
Authors: Xi Ai, Bin Fang12471-12479
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive experiments support that our method can generally improve the performance of currently successful models on three similar pairs {French, German, Romanian} English and one dissimilar pair Russian English with acceptably additional cost. |
| Researcher Affiliation | Academia | Xi Ai, Bin Fang College of Computer Science,Chongqing University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Local Alignment |
| Open Source Code | No | We implement our experiments on Tensorflow 2.0 (Abadi et al. 2016) and will open our source code on Git Hub. |
| Open Datasets | Yes | Specifically, we first retrieve monolingual corpora {French, German, English, Russian} from WMT 2018 4 (Bojar et al. 2018) including all available News Crawl datasets from 2007 through 2017 and monolingual corpora Romanian from WMT 2016 5 (Bojar et al. 2016) including News Crawl 2016. |
| Dataset Splits | No | The paper mentions specific test sets ('newstest2014', 'newstest2016') but does not provide explicit details about the train/validation splits (e.g., percentages, sample counts, or references to predefined splits) used from the WMT corpora for reproducing the experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud computing instance types. It only generally refers to 'our machine' without specifics. |
| Software Dependencies | Yes | We implement our experiments on Tensorflow 2.0 (Abadi et al. 2016) |
| Experiment Setup | Yes | Adam optimizer (Kingma and Ba 2015) is used with parameters β1 = 0.9, β2 = 0.98, ϵ = 10 9 and a dynamic learning rate over the course of training (Vaswani et al. 2017) (warmup steps = 5000). We set dropout regularization with a drop rate rate = 0.1 and label smoothing with gamma = 0.1 (Mezzini 2018). |