reproducibilityindex.ai

Less-forgetting Multi-lingual Fine-tuning

Authors: Yuren Mao, Yaobo Liang, Nan Duan, Haobo Wang, Kai Wang, Lu Chen, Yunjun Gao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on Named Entity Recognition, Question Answering and Natural Language Inference back up our theoretical analysis and validate the superiority of our proposals. In this section, we perform experimental studies on three downstream tasks Named Entity Recognition (NER), Question Answering (QA) and Natural Language Inference (NLI) respectively to evaluate the performance of our proposed LF-MLF and verify our theoretical analysis.
Researcher Affiliation	Collaboration	1Zhejiang University 2Microsoft Research Asia 3Shanghai Jiao Tong University. {yuren.mao,wanghaobo,luchen,gaoyj}@zju.edu.cn {yalia,nanduan}@microsoft.com, w.kai@sjtu.edu.cn
Pseudocode	Yes	Algorithm 1: Less-forgetting Multi-lingual Fine-tuning (LF-MLF)
Open Source Code	No	The paper's internal checklist indicates that code is included (likely in supplementary material), but the main paper text does not provide a concrete link or an explicit statement about the availability of the source code for the methodology described.
Open Datasets	Yes	In our experiments, we adopt the NER [25], Ty Di QA-Gold P [26] and XNLI [27] datasets for NER, QA and NLI respectively from the XTREME benchmark [8]. The details of these datasets are introuced in the supplementary material.
Dataset Splits	No	The paper mentions training epochs and fine-tuning, but does not explicitly specify train/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing resources used for the experiments. An internal checklist mentions 'type of resources used', but this information is not in the main paper text.
Software Dependencies	No	The paper mentions using 'Adam W' as an optimizer but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	In our experiments, we generally fine-tune XLM-Ro BERTa models (basesized model) [28] with a training batch size of 32, and Adam W [29] is used as the optimizer. As to the numbers of fine-tuning epochs, we adopt the default setting of the XTREME benchmark, that are 10, 3, 5 for NER, Ty Di QA and XNLI respectively. Besides, the learning rates are selected from {1e 5, 2e 5, 5e 5}, {1e 5, 2e 5, 5e 5} and {5e 6, 1e 5, 2e 5} with grid search for NER, Ty Di QA and XNLI respectively. All the results are averaged over 3 runs.