reproducibilityindex.ai

Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast

Authors: Chufan Shi, Cheng Yang, Xinyu Zhu, Jiahao Wang, Taiqiang Wu, Siheng Li, Deng Cai, Yujiu Yang, Yu Meng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several benchmarks (GSM8K, Strategy QA, MBPP and Human Eval) demonstrate that SCMo E can consistently enhance Mixtral 8x7B s reasoning capability across various domains.
Researcher Affiliation	Collaboration	1Tsinghua University 2University of Virginia 3The University of Hong Kong 4Tencent AI Lab
Pseudocode	No	The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code	Yes	Source code is available at https://github.com/David Fanzz/SCMo E.git
Open Datasets	Yes	For mathematical reasoning and commonsense reasoning, we select GSM8K [19] and Strategy QA [20] respectively... For code generation, we use Human Eval [21] and MBPP [22]...
Dataset Splits	No	The paper lists the datasets used (GSM8K, Strategy QA, MBPP, Human Eval) but does not explicitly provide the train/validation/test split percentages, sample counts, or specific methodology used for splitting these datasets.
Hardware Specification	Yes	The speeds are tested on 4 A100 40G with batch size = 1.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA, or other libraries).
Experiment Setup	Yes	For the penalty strength β, we search from [0.1, 0.3, 0.5, 0.7, 0.9]. Empirically, α is set to 0.1. We choose Mixtral 8x7B [6] as our backbone model. In SCMo E, we use Mixtral 8x7B s default top-2 routing as the strong activation. For the weak activation, we only consider the rank-k routing with k = 2.