Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FedIGL: Federated Invariant Graph Learning for Non-IID Graphs

Authors: Lingren Wang, Wenxuan Tu, Jiaxin Wang, Xiong Wang, Jieren Cheng, Jingxin Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct extensive experiments on graph-level classification and clustering tasks in various cross-dataset and cross-domain scenarios to validate the superiority of Fed IGL. The following research questions need to be validated. (RQ1) Can Fed IGL achieve better performance compared to SOTA baselines? (RQ2) Does Fed IGL converge under the constraints of bi-gradient optimization? (RQ3) How does each of the strategies we propose contribute to the final performance? (RQ4) How about the hyperparameter sensitivity of Fed IGL?
Researcher Affiliation	Academia	1School of Information and Communication Engineering, Hainan University 2School of Computer Science and Technology, Hainan University 3School of Cyberspace Security, Hainan University EMAIL
Pseudocode	Yes	A Algorithm Algorithm 1 Optimization process of Fed IGL Input: Maximum epoch T; Number of clients K; The distributed Non-IID datasets {Dk}K k=1; hyper-parameters t, λ, β, ε. Output: Model trained with Fed IGL. Initialize {θF SG, θinv, θvar}. for t = 1 to T do for k = 1 to K do Obtain GI, GV for each graph G Dk with t by Eq. (4). Obtain subgraph representation hinv, hvar for GI, GV . Calculate Lk global for client k with λ, β, ε by Eq. (12). Update {θF SG, θinv, θvar} by Stochastic Gradient Descent. Fix the parameters of {θF SG, θinv, θvar}. Obtain client-specific subgraph representation hspec with GV . Calculate Lk local for client k by Eq. (8). Update local model f k C for client k by Stochastic Gradient Descent. end for Aggregate {θF SG, θinv, θvar}. end for
Open Source Code	Yes	Furthermore,we provide the code of our proposed method in the supplementary material. We include the code of our proposed method in the supplementary material. The necessary environments and data preparation procedures are provided in the Git Hub repository of our method.
Open Datasets	Yes	Benchmark Datasets. We employed a total of 19 diverse datasets across multiple domains to conduct comprehensive evaluations on both classification and clustering tasks. These domains include Small Molecules (e.g., MUTAG, BZR, COX2, DHFR, PTC_MR, AIDS, BZR_MD, and NCI1), Bioinformatics (e.g., DD, PROTEINS, OHSU, and Peking_1), Synthetic (SYNTHETIC), Social Networks (e.g., COLLAB, IMDBMULTI, and IMDB-BINARY), and Computer Vision (e.g., Letter-high, Letter-low, and Letter-med). Regarding classification tasks, We follow the settings in [32], which include six distinct experimental designs: (1) cross-dataset setting utilizing seven small molecule datasets (SM), and (2)-(6) settings that incorporate both cross-dataset and crossdomain aspects, based on datasets from two different domains (BIO-SM, SM-CV) and three different domains (BIO-SM-SN, BIO-SN-CV, SM-SN-CV). For clustering tasks, we adopt the protocols in [22], including five types of non-IID settings: (1) 2 clusters within the same domain (SM), (2) 3 clusters within the same domain (SN), (3) 15 clusters within the same domain (CV), (4) 2 clusters across two domains (SM-BIO), and (5) 2 clusters across three domains (SM-BIO-SY). The dataset and experimental implementation details are provided in Appendix C.1.
Dataset Splits	Yes	Regarding classification tasks, We follow the settings in [32], which include six distinct experimental designs... For clustering tasks, we adopt the protocols in [22], including five types of non-IID settings... The dataset and experimental implementation details are provided in Appendix C.1.
Hardware Specification	Yes	To ensure fair comparisons, all methods, including Fed IGL and baselines, were implemented in Py Torch and executed on the same NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies	No	To ensure fair comparisons, all methods, including Fed IGL and baselines, were implemented in Py Torch and executed on the same NVIDIA Ge Force RTX 3090 GPU.
Experiment Setup	Yes	For graph-level structure embeddings, we use a three-layer Graph Isomorphism Network (GIN) [51] with a hidden dimension of 64 and batch size of 128 [34]. Model optimization is performed using the Adam optimizer with a learning rate of 1e-3. Dropout is set to 0.5 and weight decay to 5e-4 to improve generalization. The results indicate that the optimal values of λ, β, and ε differ by task. For classification, λ = 0.05, β = 0.2, and ε = 0.1 provide the best performance balance; for clustering, λ = 0.2, β = 0.25, and ε = 0.15 perform best. This likely reflects distinct requirements for graph representations, suggesting that client subgraph invariance and variability are influenced by downstream tasks. The parameter τ governs the proportion of invariant subgraphs; a large τ may include excessive variant structures, while a small τ may limit structural capture. In our experiments, τ = 0.25 balances shared and client-specific structures effectively in both tasks.