Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework

Authors: Yujie Xing, Xiao Wang, Bin Wu, Hai Huang, Chuan Shi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple benchmarks demonstrate that M3Dphormer achieves state-of-the-art performance, validating the effectiveness of our unified framework and model design.
Researcher Affiliation	Academia	1Beijing University of Posts and Telecommunications, China 2Beihang University, China EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Masked Multi-Head Attention Algorithm 2 Sparse Multi-Head Attention
Open Source Code	Yes	The source code is available for reproducibility at: https://github.com/null-xyj/M3Dphormer.
Open Datasets	Yes	We evaluate M3Dphormer on nine datasets, including six homophilic graphs (Cora, Cite Seer, Pubmed [45], Computer, Photo [33], and Ogbn-Arxiv [19]) and three heterophilic graphs (Squirrel, Chameleon, and Minesweeper [31]). The Cora, Cite Seer, Pub Med [45], Photo, and Computer [33] datasets are available through Py G [14], while Ogbn-Arxiv can be accessed via the OGB platform [19]. The Chameleon, Squirrel, and Minesweeper datasets are provided in the official repository of [31].
Dataset Splits	Yes	For Computer and Photo, we follow the splitting protocol in [11, 15], randomly dividing nodes into training, validation, and test sets with a 60%:20%:20% ratio over five runs. For Ogbn-Arxiv, we adopt the official split provided in [19]. The remaining datasets are split into 50%:25%:25% train/validation/test sets, repeated five times following [42, 31].
Hardware Specification	Yes	All models are trained on a single NVIDIA GPU with 24GB memory.
Software Dependencies	No	The paper mentions using Py G [14] for GCN, SAGE, GAT, GPRGNN, and FAGCN, but does not provide specific version numbers for Py G or other key software components like Python, PyTorch, or TensorFlow. It also mentions the Adam Optimizer.
Experiment Setup	Yes	All hyperparameters are selected via grid search over the following search space: Learning rate: {5 10 4, 10 3, 5 10 3} Number of M3Dphormer layers: Cora, Citeseer, Pubmed, Chameleon, Photo: {2, 3, 4} Squirrel, Computer, Ogbn-Arxiv: {5, 6, 7} Minesweeper: {10, 12, 15} Number of attention heads: {1, 2, 4, 8} Hidden dimension: {64, 128, 256} Weight decay: {0, 10 4, 5 10 4, 10 3, 5 10 3} Dropout rate: {0.3, 0.5, 0.7} Attention dropout rate: {0.1, 0.3, 0.5} Number of clusters: Cora, Citeseer, Squirrel, Minesweeper: {96, 128, 160, 192} Pubmed, Photo, Computer: {160, 192, 224, 256} Ogbn-Arxiv: {2048} Chameleon: {32, 64, 96, 128}