Less-forgetting Multi-lingual Fine-tuning
Authors: Yuren Mao, Yaobo Liang, Nan Duan, Haobo Wang, Kai Wang, Lu Chen, Yunjun Gao
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Named Entity Recognition, Question Answering and Natural Language Inference back up our theoretical analysis and validate the superiority of our proposals. In this section, we perform experimental studies on three downstream tasks Named Entity Recognition (NER), Question Answering (QA) and Natural Language Inference (NLI) respectively to evaluate the performance of our proposed LF-MLF and verify our theoretical analysis. |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Microsoft Research Asia 3Shanghai Jiao Tong University. {yuren.mao,wanghaobo,luchen,gaoyj}@zju.edu.cn {yalia,nanduan}@microsoft.com, w.kai@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1: Less-forgetting Multi-lingual Fine-tuning (LF-MLF) |
| Open Source Code | No | The paper's internal checklist indicates that code is included (likely in supplementary material), but the main paper text does not provide a concrete link or an explicit statement about the availability of the source code for the methodology described. |
| Open Datasets | Yes | In our experiments, we adopt the NER [25], Ty Di QA-Gold P [26] and XNLI [27] datasets for NER, QA and NLI respectively from the XTREME benchmark [8]. The details of these datasets are introuced in the supplementary material. |
| Dataset Splits | No | The paper mentions training epochs and fine-tuning, but does not explicitly specify train/validation/test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing resources used for the experiments. An internal checklist mentions 'type of resources used', but this information is not in the main paper text. |
| Software Dependencies | No | The paper mentions using 'Adam W' as an optimizer but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | In our experiments, we generally fine-tune XLM-Ro BERTa models (basesized model) [28] with a training batch size of 32, and Adam W [29] is used as the optimizer. As to the numbers of fine-tuning epochs, we adopt the default setting of the XTREME benchmark, that are 10, 3, 5 for NER, Ty Di QA and XNLI respectively. Besides, the learning rates are selected from {1e 5, 2e 5, 5e 5}, {1e 5, 2e 5, 5e 5} and {5e 6, 1e 5, 2e 5} with grid search for NER, Ty Di QA and XNLI respectively. All the results are averaged over 3 runs. |