Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Linear Mode Connectivity of Mixture-of-Experts Architectures

Authors: Viet-Hoang Tran, Van Hoan Trinh, Khanh-Vinh Bui, Tan M. Nguyen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically validate the presence of LMC using our proposed algorithm across diverse Mo E configurations including dense, sparse, and shared-expert variants under a wide range of model settings and datasets of varying scales and modalities. Our results confirm the existence of LMC in Mo E architectures and offer fundamental insights into the functional landscape and optimization dynamics of deep learning models.
Researcher Affiliation	Academia	Viet-Hoang Tran Department of Mathematics National University of Singapore EMAIL Van-Hoan Trinh Department of Mathematics Technical University of Munich EMAIL Khanh Vinh Bui Independent Researcher Ho Chi Minh City, Vietnam EMAIL Tan M. Nguyen Department of Mathematics National University of Singapore EMAIL
Pseudocode	Yes	Algorithm 1 Weight Matching for Mixture-of-Experts Input: Mo E model weights ϕ = (Wi, bi, θi)i=1,...,n, ϕ = (W i, b i, θ i)i=1,...,n Output: Permutation τ for experts, and permutations {Pi}n i=1 for hidden units % Step 1: Match experts order using two methods for method in {gate, expert} do Compute cost matrix C Solve LAP to obtain expert permutation τmethod end for % The two candidate expert orderings τgate and τexpert are obtained % Step 2: Align internal weights of matched expert pairs for method in {gate, expert} do for i = 1 to n do Compute Pi by applying Weight Matching to θi and θ τmethod(i) end for end for return τgate, ({Pi}n i=1)gate , τexpert, ({Pi}n i=1)expert
Open Source Code	Yes	The code is publicly available at https://github.com/MLResearchX/lmc-moe.
Open Datasets	Yes	For vision tasks, our study includes MNIST, CIFAR-10, CIFAR-100, Image Net-1k, as well as transfer learning scenarios from Image Net-21k to CIFAR-10 and CIFAR-100. For language modeling, we utilize Wiki Text103 and the One Billion Word dataset.
Dataset Splits	Yes	Experimental Design. We investigate LMC by replacing the Feedforward Network (FFN) in the Transformer [84] layer with a randomly initialized Mo E, based on empirical evidence from Section 6.4, Appendices E, G.2, and G.3 indicating lower perturbation sensitivity when replacing deeper FFN layers with Mo Es. Only the Mo E parameters are fine-tuned using multiple random seeds for each experiment. LMC is evaluated by linearly interpolating between all checkpoint pairs, measuring model performance on the test set at 25 evenly spaced points along the interpolation... Datasets and Models. We use Vi T [20] for image classification (MNIST [51], CIFAR-10/100 [47], Image Net [18]) and GPT-2 [64] for language modeling (Wiki Text103 [59] and One Billion Word [14]).
Hardware Specification	Yes	All experiments are executed on a single NVIDIA H100 GPU with 80GB of memory, except for the One Billion Word task, which utilizes two H100 GPUs.
Software Dependencies	No	Due to the use of the JAX framework, approximately 75% of GPU memory (around 60GB) is pre-allocated by default.
Experiment Setup	Yes	Hyperparameters such as batch size, optimizer, number of experts, and hidden size are fixed, while the learning rate is tuned per setting. Full details are provided in Appendix F.