Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Graphs Help Graphs: Multi-Agent Graph Socialized Learning

Authors: Jialu Li, Yu Wang, Pengfei Zhu, Wanyu Lin, Xinjie Yao, Qinghua Hu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of GHG in heterogeneous dynamic graphs by an extensive empirical study. The code is available through https://github.com/Jillian555/GHG. 5 Experiments 5.1 Datasets and Setups 5.2 Performance Comparison 5.3 Ablation Study 5.4 Further Analysis
Researcher Affiliation	Academia	1College of Intelligence and Computing, Tianjin University, Tianjin, China 4Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China EMAIL EMAIL
Pseudocode	Yes	The pseudo-code can be found in Appendix C.1. Algorithm 1 Training of GHG Algorithm 2 Inference in GHG
Open Source Code	Yes	The code is available through https://github.com/Jillian555/GHG.
Open Datasets	Yes	We assess the effectiveness of GHG by leveraging seven publicly available datasets, and their statistical details are presented in Appendix C.2. Cora Full [48], Arxiv [49], and Reddit [50] include 7, 4, and 4 tasks, respectively, each containing 10 classes. Cora [51] and Cite Seer [51] both comprise 3 tasks with 2 classes per task. SLAP [52] and Computers [53] include 3 and 2 tasks, respectively, each with 5 classes.
Dataset Splits	Yes	For dataset splits, Cora Full, Arxiv, and Reddit have 60% training, 20% validation, and 20% testing, while Cora, Cite Seer, SLAP, and Computers have 20% training, 40% validation, and 40% testing.
Hardware Specification	Yes	Our model is deployed in Py Torch and on an NVIDIA RTX 3090 GPU.
Software Dependencies	No	Our model is deployed in Py Torch and on an NVIDIA RTX 3090 GPU. (It only mentions PyTorch without a specific version number, and no other software with versions).
Experiment Setup	Yes	Our model is deployed in Py Torch and on an NVIDIA RTX 3090 GPU. We use Adam with weight decay, setting the learning rate to 0.005 and training for 50 epochs. For graph synthesis of each agent, we employ Adam with a learning rate of 0.001 or 0.005 and 50 epochs per interaction round. The graph prompt settings follow those of TPP [20], with 3 Laplacian steps and three prompts.