Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks

Authors: Mingze Wang, Weinan E

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To support our main theoretical results, we conduct two new experiments, each aligned with one of our key insights. The experimental details are shown in Appendix C.
Researcher Affiliation	Academia	Mingze Wang School of Mathematical Sciences, Peking University, Beijing, China EMAIL Weinan E Center for Machine Learning Research and School of Mathematical Sciences, Peking University, Beijing, China AI for Science Institute, Beijing, China EMAIL
Pseudocode	No	The paper describes theoretical concepts and proofs, and includes experimental validation. However, it does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described in prose and mathematical formulations.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code or data of the experiments are simple and easy to reproduce following the description in the paper.
Open Datasets	No	Specifically, we consider the low-dimensional manifold M = {x RD : x2 1 + x2 2 = 1; xi = 0, i > 2} embedded in RD with D > 2. The target function is f(x) = sin(5x1) + cos(3x2), defined on M. ... As defined in our Figure 3, we consider the piecewise function f with compositional sparsity defined over 32 = 9 unit cubes.
Dataset Splits	No	The experiments use mathematically defined functions and manifolds for evaluation rather than external datasets with predefined splits. There is no mention of train/test/validation splits for any dataset.
Hardware Specification	Yes	The experiments in Section 6 are conducted on 1 A100 GPU.
Software Dependencies	No	In Experiment I, the models are trained for 2, 000 iterations with batch size 128 (online), using squared loss and Adam optimizer with learning rate 1e-3. In Experiment II, the models are trained for 5, 000 iterations with batch size 128 (online), using squared loss and Adam optimizer with learning rate 1e-3. While Adam optimizer and squared loss are mentioned, no specific software library versions (e.g., PyTorch, TensorFlow) or Python versions are provided.
Experiment Setup	Yes	In Experiment I, the models are trained for 2, 000 iterations with batch size 128 (online), using squared loss and Adam optimizer with learning rate 1e-3. ... As a model, we consider 1-4-Mo E , a 1-layer Mo E comprising 1 router and 4 experts, where each expert is a two-layer Re LU network with hidden width 10. ... In Experiment II, the models are trained for 5, 000 iterations with batch size 128 (online), using squared loss and Adam optimizer with learning rate 1e-3. ... As the model, we consider 2-3-Mo E (a 2-layer Mo E comprising 2 routing layers and 2 expert layers with 3 experts each); ... Each expert is a two-layer Re LU FFN with hidden width m {16, 32, 64, 128}.