reproducibilityindex.ai

On the Pareto Front of Multilingual Neural Machine Translation

Authors: Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By training over 200 multilingual models with various model sizes, data sizes, and language directions, we ﬁnd it interesting that the performance of certain translation direction does not always improve with the increase of its weight in the multi-task optimization objective. In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget.
Researcher Affiliation	Collaboration	1National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University 2Microsoft Research
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the paper.
Open Source Code	Yes	We release the code at https://github.com/pkunlp-icler/Pareto MNMT for reproduction.
Open Datasets	Yes	We use datasets provided in WMT10 (Wang et al., 2020b) and WMT19 (Barrault et al., 2019) to conduct the MNMT experiment. The description of dataset is listed in Appendix A. Table 5: The datasets description for the main experiments. We randomly choose a subset of the full training set of a direction to form a smaller one.
Dataset Splits	No	The paper mentions using "validation loss" for checkpoint selection ("Evaluation is done every 5k steps, and we choose the best checkpoint with lowest average validation loss") implying the existence of a validation set. However, it does not provide explicit details about the train/validation/test dataset splits (e.g., percentages or sample counts for each split from the overall dataset) to reproduce the partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or cloud computing instance types used for running the experiments.
Software Dependencies	No	The paper mentions using "fairseq(Ott et al., 2019) as the training framework" and "scipy.optimize.curve_ﬁt function from the scipy library" (footnote 1) but does not specify version numbers for these software components.
Experiment Setup	Yes	Table 6: Overview of model sizes and optimization hyper-parameters. All models are trained with 4k warmup steps with the learning rate linearly increasing from 0 to 3e-4 then decreasing with inverse_sqrt learning rate scheduler. The label smoothing term is set to 0.1 following the NMT literature convention. Evaluation is done every 5k steps.