MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages
Authors: Zixian Huang, Wenhao Zhu, Gong Cheng, Lei Li, Fei Yuan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that Mind Merger consistently outperforms all baselines, especially in low-resource languages. |
| Researcher Affiliation | Academia | Zixian Huang1, Wenhao Zhu1, Gong Cheng1, Lei Li2, Fei Yuan3 1State Key Laboratory for Novel Software Technology, Nanjing University 2Carnegie Mellon University 3Shanghai Artificial Intelligence Laboratory |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available on https://github.com/CONE-MT/Mind Merger. |
| Open Datasets | Yes | Training Datasets. Three categories of training data were used in our methods and baselines. (1) General bilingual pairs. We used the translation data from the multilingual language to English and randomly sampled 100K of data for each language (except English) from the Lego-MT [Yuan et al., 2023b] dataset... (2) English task data. We used Meta Math QA [Yu et al., 2023] and Multi NLI [Williams et al., 2018] datasets... (3) Query translation task data. We used the translated results given by Chen et al. [2023] and the official dev set of XNLI... and translated the X-CSQA training set based on M2M100-1.2B [Fan et al., 2021]. |
| Dataset Splits | No | The paper mentions test sets and training data but does not explicitly describe a validation dataset split or a dedicated validation set size/percentage for the experiments. |
| Hardware Specification | Yes | For all models, we set learning rate=2e-5, batch size=128, max length=512, and epoch=3 and used 8 NVIDIA A100 GPUs for training. |
| Software Dependencies | No | The paper mentions specific models like Llama 2-7B, m T5-xl, and NLLB200-3.3B, but does not provide specific version numbers for general software dependencies such as Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | For all models, we set learning rate=2e-5, batch size=128, max length=512, and epoch=3 and used 8 NVIDIA A100 GPUs for training. |