reproducibilityindex.ai

MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages

Authors: Zixian Huang, Wenhao Zhu, Gong Cheng, Lei Li, Fei Yuan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that Mind Merger consistently outperforms all baselines, especially in low-resource languages.
Researcher Affiliation	Academia	Zixian Huang1, Wenhao Zhu1, Gong Cheng1, Lei Li2, Fei Yuan3 1State Key Laboratory for Novel Software Technology, Nanjing University 2Carnegie Mellon University 3Shanghai Artificial Intelligence Laboratory
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available on https://github.com/CONE-MT/Mind Merger.
Open Datasets	Yes	Training Datasets. Three categories of training data were used in our methods and baselines. (1) General bilingual pairs. We used the translation data from the multilingual language to English and randomly sampled 100K of data for each language (except English) from the Lego-MT [Yuan et al., 2023b] dataset... (2) English task data. We used Meta Math QA [Yu et al., 2023] and Multi NLI [Williams et al., 2018] datasets... (3) Query translation task data. We used the translated results given by Chen et al. [2023] and the official dev set of XNLI... and translated the X-CSQA training set based on M2M100-1.2B [Fan et al., 2021].
Dataset Splits	No	The paper mentions test sets and training data but does not explicitly describe a validation dataset split or a dedicated validation set size/percentage for the experiments.
Hardware Specification	Yes	For all models, we set learning rate=2e-5, batch size=128, max length=512, and epoch=3 and used 8 NVIDIA A100 GPUs for training.
Software Dependencies	No	The paper mentions specific models like Llama 2-7B, m T5-xl, and NLLB200-3.3B, but does not provide specific version numbers for general software dependencies such as Python, PyTorch, or other libraries.
Experiment Setup	Yes	For all models, we set learning rate=2e-5, batch size=128, max length=512, and epoch=3 and used 8 NVIDIA A100 GPUs for training.