Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Unchosen Experts Can Contribute Too: Unleashing MoE Modelsโ Power by Self-Contrast
Authors: Chufan Shi, Cheng Yang, Xinyu Zhu, Jiahao Wang, Taiqiang Wu, Siheng Li, Deng Cai, Yujiu Yang, Yu Meng
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several benchmarks (GSM8K, Strategy QA, MBPP and Human Eval) demonstrate that SCMo E can consistently enhance Mixtral 8x7B s reasoning capability across various domains. |
| Researcher Affiliation | Collaboration | 1Tsinghua University 2University of Virginia 3The University of Hong Kong 4Tencent AI Lab |
| Pseudocode | No | The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code. |
| Open Source Code | Yes | Source code is available at https://github.com/David Fanzz/SCMo E.git |
| Open Datasets | Yes | For mathematical reasoning and commonsense reasoning, we select GSM8K [19] and Strategy QA [20] respectively... For code generation, we use Human Eval [21] and MBPP [22]... |
| Dataset Splits | No | The paper lists the datasets used (GSM8K, Strategy QA, MBPP, Human Eval) but does not explicitly provide the train/validation/test split percentages, sample counts, or specific methodology used for splitting these datasets. |
| Hardware Specification | Yes | The speeds are tested on 4 A100 40G with batch size = 1. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA, or other libraries). |
| Experiment Setup | Yes | For the penalty strength ฮฒ, we search from [0.1, 0.3, 0.5, 0.7, 0.9]. Empirically, ฮฑ is set to 0.1. We choose Mixtral 8x7B [6] as our backbone model. In SCMo E, we use Mixtral 8x7B s default top-2 routing as the strong activation. For the weak activation, we only consider the rank-k routing with k = 2. |