Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Relieving the Over-Aggregating Effect in Graph Transformers

Authors: Junshu Sun, Wanxing Chang, Chenxue Yang, Qingming Huang, Shuhui Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations show that Wideformer can effectively mitigate over-aggregating. As a result, the backbone methods can focus on the informative messages, achieving superior performance compared to baseline methods. We demonstrate the effectiveness of Wideformer on thirteen real-world datasets, which consistently relieves over-aggregating and benefits backbones to achieve superior performance.
Researcher Affiliation	Collaboration	1State Key Lab. of AI Safety, ICT, CAS 2University of Chinese Academy of Sciences 3DAMO Academy, Alibaba Group 4Agriculture Information Institute, CAAS EMAIL EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Center-selection Function Cluster
Open Source Code	Yes	Codes are available at https://github.com/sunjss/over-aggregating.
Open Datasets	Yes	We adopt thirteen real-world datasets, including both heterophilic and homophilic graphs. For baseline methods, both GNNs and graph transformers are adopted. Please refer to Appendix D.2 for detailed experimental setups. The attention entropy experiments in Fig. 1, Fig. 3, and Fig. 5 are conducted on Cora, Amazon Photo, Amazon Computers, Coauthor CS, Coauthor Physics, tolokers, amazon-ratings, minesweeper, and ogb-arxiv. The cluster ablation study in Fig. 7 is conducted and averaged on all the datasets included in this paper.
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits from citations for their experiments). It mentions dataset names which might have standard splits, but this is not explicitly stated in the paper's experimental setup.
Hardware Specification	Yes	The framework is implemented with Py Torch [31] and Py Torch Geometric [12], and trained on a single NVIDIA A100.
Software Dependencies	No	The framework is implemented with Py Torch [31] and Py Torch Geometric [12], and trained on a single NVIDIA A100. (Specific version numbers for PyTorch or PyTorch Geometric are not provided.)
Experiment Setup	Yes	We perform grid search based on the validation performance of the models as follows: Graph GPS. We search the number of graph transformer layers in {1, , 6}, the number of hidden dimensions in {64, 80, 128, 256}, the number of heads in {1, 2, 4}, dropout in {0.1, 0.2, 0.3, 0.5}, and learning rate in {5e 4, 1e 3, 1e 2}. The rest hyperparameters are fixed as in the original implementation. SGFormer. The GNN backbone is implemented as GCN [17]. The number of GCN layers is searched in {1, , 10}, the number of hidden dimensions in {64, 80, 128, 256}, the number of heads in {1, 2, 4}, dropout in {0.1, 0.2, 0.3, 0.5}, and learning rate in {1e 3, 5e 3, 1e 2}. The rest hyperparameters are fixed as in the original implementation. Polynormer. The GNN backbone is implemented as GAT [48]. We search the number of graph transformer layers in {1, , 6}, the number of GAT layers in {1, , 10}, the number of hidden dimensions in {64, 80, 128, 256}, the number of heads in {1, 2, 4, 8}, dropout in {0.1, 0.2, 0.3, 0.5}, and learning rate in {5e 4, 1e 3}. The rest hyperparameters are fixed following the original implementation.