Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GAMMA: Gated Multi-hop Message Passing for Homophily-Agnostic Node Representation in GNNs

Authors: Amir Ghazizadeh, Rickard Ewetz, Hao Zheng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show GAMMA matches or exceeds state-of-the-art heterophilic GNN accuracy, achieving up to 20% faster inference. Our code is publicly available at https://github.com/amir-ghz/GAMMA. ... We evaluate GAMMA on the semi-supervised node classification task across a diverse set of benchmark datasets, encompassing both homophilic and heterophilic graphs, with detailed results presented in Table 1.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, USA 2Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA EMAIL, EMAIL
Pseudocode	Yes	For more details about GAMMA and the algorithm pseudocode, refer to Appendix C and Algorithm 1.
Open Source Code	Yes	Our code is publicly available at https://github.com/amir-ghz/GAMMA.
Open Datasets	Yes	We evaluate GAMMA on the semi-supervised node classification task across a diverse set of benchmark datasets, encompassing both homophilic and heterophilic graphs, with detailed results presented in Table 1. ... All models were implemented in Python using Py Torch Geometric [7].
Dataset Splits	Yes	Specifically, for the heterophilic benchmark datasets, we utilize the 10 fixed train/validation/test splits [31]. These splits allocate 50% of nodes for training, 25% for validation, and 25% for testing, reflecting a common practice... For the homophilic datasets, we also employ 10 distinct random splits, following the setup in [40], with 48% of nodes for training, 32% for validation, and 20% for testing.
Hardware Specification	Yes	Experiments were conducted on a desktop machine equipped with an NVIDIA RTX A2000 GPU (12GB VRAM) [27].
Software Dependencies	Yes	CUDA 12.8 facilitated GPU acceleration and NVIDIA Nsight Compute CLI [28] was employed for profiling and computational performance evaluations such as memory consumptions and runtime. ... NVIDIA Nsight Compute CLI (v2025.2.0) User Guide, 2025.
Experiment Setup	Yes	For a fair and rigorous comparison of both predictive accuracy and computational demands, all models, including baselines, were configured with a consistent architecture: two GNN layers and a fixed hidden dimension size of 32. Hyperparameters for each model were optimized via a grid search over learning rates in {0.05, 0.01, 0.002} and dropout rates {0.0, 0.5}. Across all datasets and splits, models were trained for a fixed 500 epochs.