Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Effective Federated Graph Foundation Model via Mitigating Knowledge Entanglement

Authors: Yinlin Zhu, Xunkai Li, Jishuo Jia, Miao Hu, Di Wu, Meikang Qiu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our Contributions. (1) Problem Identiﬁcation. To the best of our knowledge, this is the ﬁrst exploration of the Fed GFM paradigm, which organically combines FGL and GFM to offer a practical solution for training graph foundation model across silos with diverse graph domain and tasks. (2) In-depth Investigation. (Sec. 3) We conduct an in-depth empirical investigation for Fed GFM, assessing its feasibility and revealing a non-trivial challenges named knowledge entanglement, providing valuable insights for its development. (3) Novel Framework. (Sec. 4) We propose a novel and effective Fed GFM framework named Fed GFM+, which employs two key modules to address the knowledge entanglement challenge, including Anc DAI from the global perspective and Ada DPP from the local perspective. (4) State-of-the-art Performance. (Sec. 5) Extensive experimental results on graph learning with 8 cross-task and cross-domain datasets demonstrate the superiority of Fed GFM+ compared with 20 baselines, including 5 isolated supervised learning methods, 10 FGL techniques, and 5 federated variants of centralized GFM training strategies. Section 5 is titled "Experiments" and contains subsections like "5.2 Performance Comparison" and "5.3 Ablation Study", which are indicative of empirical research.
Researcher Affiliation	Academia	1 Sun Yat-sen University, Guangzhou, China 2 Beijing Institute of Technology, Beijing, China 3 Shandong University, Weihai, China 4 Augusta University, Augusta, Georgia, USA EMAIL, EMAIL, EMAIL EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods using mathematical equations and natural language, but does not include explicit pseudocode blocks or algorithm listings. For example, Section 2 describes the Graph Vector Quantization-Variational Auto-Encoder using equations (1) and (2).
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufﬁcient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justiﬁcation: Yes, we have included the source code in the supplementary materials to enable interested researchers to reproduce the experimental results presented in our paper with sufﬁcient guidance.
Open Datasets	Yes	We utilize 8 datasets from various domains and tasks, as detailed in Table 4. Table 4: The statistics of evaluated datasets in our experiments. Dataset Domain Task # Graphs Avg. #Nodes Avg. #Edges # Classes Cora Citation Node 1 2,708 10,556 7 Pub Med Citation Node 1 19,717 44,338 3 Arxiv Citation Node 1 169,343 1,166,243 40 Wiki CS Hyper link Node 1 11,701 216,123 10 FB15K237 Knowledge Link 1 14,541 310,116 237 WN18RR Knowledge Link 1 40,943 93,003 11 PCBA Molecule Graph 437,929 26.0 28.1 128 HIV Molecule Graph 41,127 25.5 27.5 2
Dataset Splits	Yes	Finally, the default train/validation/test splits used in the ﬁne-tuning stage are summarized in Table. 5. Notably, due to the distributed nature of federated settings, the training set proportion is typically much higher than in centralized graph learning paradigms. This splitting strategy has been widely adopted in prior works [27]. Table 5: Train/Validation/Test splits for different datasets Dataset Train Split Validation Split Test Split Cora 5% 20% 40% Pub Med 60% 20% 20% Wiki CS 80% 10% 10% Arxiv 80% 10% 10% WN18RR 80% 10% 10% FB15k237 80% 10% 10% Chem HIV 80% 10% 10% Chem PCBA 80% 10% 10%
Hardware Specification	Yes	C.7 Experimental Environment The experimental machine is an Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz and NVIDIA A100 with 80GB memory and CUDA 12.4. The operating system is Ubuntu 22.04.5 with 251GB memory.
Software Dependencies	Yes	C.7 Experimental Environment The experimental machine is an Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz and NVIDIA A100 with 80GB memory and CUDA 12.4. The operating system is Ubuntu 22.04.5 with 251GB memory. C.6 Hyperparameters For federated variants of centralized GFM Baselines, we adopt the hyperparameter conﬁgurations reported in their original papers whenever available. When unspeciﬁed, we employ automated hyperparameter optimization using the Optuna framework [2].
Experiment Setup	Yes	C.6 Hyperparameters For Isolated Supervised Learning Baselines, we perform 1,000 epochs of local training with early stopping based on validation performance. For FL/FGL Baselines, we conduct 100 communication rounds, where each round includes 2 local training epochs. We use the Adam optimizer with a learning rate of 1e-2, weight decay of 5e-4, and dropout rate of 0.5. For federated variants of centralized GFM Baselines, we adopt the hyperparameter conﬁgurations reported in their original papers whenever available. When unspeciﬁed, we employ automated hyperparameter optimization using the Optuna framework [2]. Federated pre-training is carried out for 50 communication rounds, each consisting of 2 local pre-training epochs. For our proposed Fed GFM+ framework, we ﬁx the learning rate for pre-training to 1e-4. During ﬁne-tuning, we perform a grid search over learning rates in {10e-5, 10e-4, 10e-3, 10e-2, 10e-1} for each dataset. The weight decay is ﬁxed to 5e-4, and the batch size is set to 1,024. Federated pre-training is conducted for 25 communication rounds, with 2 local epochs per round.