Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Closer Look at Graph Transformers: Cross-Aggregation and Beyond

Authors: Jiaming Zhuo, Ziyi Ma, Yintong Lu, Yuwei Liu, Kun Fu, Di Jin, Chuan Wang, Wu Wenning, Zhen Wang, Xiaochun Cao, Liang Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on multiple benchmark datasets demonstrate the effectiveness and efficiency of UGCFormer. ... 4 Experiments This section evaluates the effectiveness and universality of the proposed UGCFormer by comparing its performances against various diverse graph learning models on the node classification task. ... 4.1 Experimental Results Homophilic Graphs. The experiment results for node classification on homophilic graphs are shown in Tab. 1, ... Heterophilic Graphs. Tab. 2 shows the results of the node classification task on seven heterophilic graphs, ... Scalability Study. To evaluate the scalability of the proposed UGCFormer, this experiment quantitatively changes the network size and records the running time and GPU memory usage. ... Node Property Prediction. This experiment seeks to evaluate the effectiveness and scalability of GTs by comparing them with GNNs on two large-scale benchmark datasets. ... 4.2 Additional Analysis Ablation Study. This experiment evaluates the contributions of the proposed cross-attention module and the consistency constraint by comparing UGCFormer with two variants lacking these components. Fig. 3 shows that these variants consistently underperform UGCFormer across the four datasets. Parameter Sensitivity Analysis. These experiments aim to provide an intuitive understanding for the selection of hyperparameters. Performance changes due to varying the number of layers (l) and layer dimensions (d) are shown in Figs. 4 and 5, respectively.
Researcher Affiliation	Academia	1Hebei Province Key Laboratory of Big Data Calculation, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China 2College of Intelligence and Computing, Tianjin University, Tianjin, China 3School of Computer Science and Technology, Beijing Jiao Tong University, Beijing, China 4School of Artificial Intelligence, OPtics and Electro Nics (i OPEN), School of Cybersecurity, Northwestern Polytechnical University, Xi an, China 5School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	The detailed implementation is provided in Algorithm 1. ... Algorithm 1: UGCFormer ... Algorithm 2: Py Torch-style Code for DCA layer
Open Source Code	Yes	5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have included complete and executable code within the supplemental material, ensuring the reproducibility of our results.
Open Datasets	Yes	E.1 Datasets and Splitting Datasets. In the node classification experiments, sixteen publicly available benchmark datasets are utilized. These graphs can be classified into two categories based on whether their Edge Homophily [37] exceeds 0.5: seven graphs are tagged as homophilic graphs, including Cora [42], Cite Seer [42], Pub Med [42], Photo [43], CS [43], Physics [43], and Questions [38]. The remaining seven graphs are marked as heterophilic graphs, containing Cornell [37], Texas [37], Wisconsin [37], Actor [45], Chameleon [40], Squirrel [40], and Ratings [38]. ... Additionally, two large-scale graph datasets, i.e., ogbn-arxiv [23] and ogbn-proteins [23], are employed for node property prediction experiment. Statistics are shown in Tab. 4.
Dataset Splits	Yes	E.1 Datasets and Splitting Dataset Splitting. To ensure that the experimental results are credible and reproducible, this paper follows well-established dataset splitting strategies. For the Cora, Cite Seer, and Pub Med, the public standard splitting described in [29] is adopted, with 20 nodes per class for training, 500 for validation, and 1000 for testing. The Photo, CS, and Physics are randomly divided into training, validation, and testing sets in a 60%, 20%, and 20% ratio, respectively. For the heterophilic datasets Cornell, Texas, Wisconsin, Actor, and Chameleon, this paper employ 10 standard train/validation/test splits with a division ratio of 48%, 32%, and 20%, respectively. Note that the Chameleon and Squirrel used here are duplicates-removed filtered versions as referenced in [38]. The Ratings, and Questions follow a 50%/25%/25% train/validation/test random split pattern. For the two datasets from the OGB [23], i.e., ogbn-arxiv and ogbn-proteins, the provided standard splits are utilized.
Hardware Specification	Yes	E.3 Experimental Setups Configurations. The experiment is performed on two Linux machines using a single Ge Force RTX4090 24 GB GPU and a single NVIDIA A800 80GB GPU, respectively.
Software Dependencies	No	E.2.2 Graph Transformers (GTs) The following specifies the GT baselines utilized in our comparative analysis. ... For the four GNN baselines, including GCN, GAT, Graph SAGE, and APPNP, we utilize the public library, Py Torch Geometric (Py G) [13], for their implementation. For the other three GNN baselines, we utilize their original code.
Experiment Setup	Yes	E.3 Experimental Setups ... Hyper-parameters. The hyperparameters are selected via a grid search strategy. In the node classification task, models are trained employing an Adam optimizer with the learning rate among {0.001, 0.005, 0.01} and the weight decay among {0, 1e 5, 5e 5, 1e 4, 5e 4, 1e 3, 5e 3, 1e 2}. The number of layers is selected from {1, 2, 3, 4, 5}, and the dimension of hidden layers is chosen from {64, 128, 256, 512}, and their impacts on model performance are analyzed in Section 4.2. For the node property prediction task, the hyperparameter selection follows the baseline [50]. For the unique hyperparameters in UGCFormer, α is chosen from a range starting at 0.1 and increasing by increments of 0.1, up to 0.9, β is selected from {0.001, 0.01, 0.1, 1}, and τ is fixed to 0.5. Refer to Tab. 5 for the chosen parameters that correspond to the reported results.