Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Enhanced Expert Merging for Mixture-of-Experts in Graph Foundation Models

Authors: Lei Liu, Xingyu Xia, Qianqian Xie, Ben Liu, Wenjie Xu, Min Peng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our methods effectively enhance model performance. We validate our approaches through extensive experiments on 38 graph datasets across various domains and downstream tasks, demonstrating their effectiveness and efficiency.
Researcher Affiliation	Academia	1School of Computer Science, Wuhan University, Wuhan, China 2School of Artificial Intelligence, Wuhan University, Wuhan, China 3Center for Language and Information Research, Wuhan University, Wuhan, China
Pseudocode	Yes	Algorithm 1 Training Pipeline of KDEM Algorithm 2 Training Pipeline of PPEM
Open Source Code	Yes	We use only publicly available datasets (such as those released by Any Graph https://huggingface.co/datasets/hkuds/Any Graph_datasets), and we include an anonymized zip file with all our codes.
Open Datasets	Yes	Our study leverages an extensive collection of graph datasets drawn from diverse domains, comprising a total of 14,437,372 nodes and 199,265,688 edges. All datasets in this study are obtained from prior research [33, 21, 40]. The dataset statistics are presented in Table 5. These datasets are organized into multiple categories and groups to address specific research questions and evaluation needs.
Dataset Splits	Yes	We follow the same dataset splits as prior works [33, 21, 40]. Our method adopts the same zero-shot setting as Any Graph. We train two separate models on Link1 and Link2 respectively, then evaluate the zero-shot performance of the Link1-trained model on Link2 and vice versa. Models are cross-applied between Link1 and Link2's domain-specific data.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA Ge Force RTX 3090 GPU (24GB VRAM).
Software Dependencies	No	Our methods are implemented using Py Torch, where SVD is computed via the built-in svd_lowrank function for enhanced computational efficiency.
Experiment Setup	Yes	The models are trained for 100 epochs using the Adam optimizer with a batch size of 4096 and a learning rate of either 1 10 4 (Link1) or 2 10 4 (Link2). The number of message passing layers L is set to 3. We use 8 expert models, each consisting of an 8-layer MLP with 512 units per layer. The training step interval T1 for knowledge distillation is set to 100, with a loss weight γ=0.01. The training step interval T2 for the exponential moving average (Eq.(13)) is selected from {10,20,50,100,200}, with the decay factor β chosen from {0.99,0.999,0.9995,0.9999}. The recalibration strength parameter ρ (Eq.(23)) is set to 0.2.