Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers

Authors: Guoguo Ai, Guansong Pang, Hezhe Qiao, Yuan Gao, Hui Yan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on 11 real-world node classification datasets across various domains, scales, and graph properties, as well as 5 graph classification datasets, show that Grok Former outperforms state-of-the-art GTs and GNNs.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China 2School of Computing and Information Systems, Singapore Management University, Singapore. Correspondence to: Hui Yan <EMAIL>.
Pseudocode	No	The paper describes the methodology using mathematical formulations and textual descriptions in sections like '4. Methodology' and '4.2. Network Architecture of Grok Former', but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/GGA23/GrokFormer.
Open Datasets	Yes	We conduct node classification experiments on 11 widely used datasets in previous graph spectral models (Bo et al., 2023; He et al., 2021; Deng et al., 2024), including six homophilic datasets, i.e., Cora, Citeseer, Pubmed, the Amazon co-purchase graph Photo (He et al., 2021), an extracted subset of Wikipedia s Computer Science articles Wiki CS (Dwivedi et al., 2023), and a co-authorship network Physics (Shchur et al., 2018; Chen et al., 2024a). We also evaluate on five heterophilic datasets, i.e., Wikipedia graphs Chameleon and Squirrel, the Actor cooccurrence graph (Pei et al., 2020), webpage graphs Texas from Web KB, and Penn94, a large-scale friendship network from the Facebook 100 (Lim et al., 2021). ... We also conduct graph classification experiments on five TU benchmarks from diverse domains. They include three bioinformatics graph datasets, i.e., PROTEINS (Borgwardt et al., 2005), PTC-MR (Toivonen et al., 2003), and MUTAG (Debnath et al., 1991) and two social network datasets, i.e., IMDB-BINARY and IMDB-MULTI (Yanardag & Vishwanathan, 2015)... In this section, we conducts experiments on additional graph-level datasets, including a subset (12K) of ZINC molecular graphs (250K) dataset (Irwin et al., 2012), the super-pixels dataset CIFAR10 (Dwivedi et al., 2023), and a long-range graph benchmark Peptides-func (Dwivedi et al., 2022).
Dataset Splits	Yes	Following the previous works (He et al., 2021; Huang et al., 2024; Bo et al., 2023), we randomly split the node set into train/validation/test set with ratio 60%/20%/20%, and generate 10 random splits to evaluate all models on the same splits. ... For the large-scale dataset Penn94, (Lim et al., 2021) provides five official splits, so we run it five times to report the mean accuracy. For other datasets, we run the experiments ten times, each with a different random split. ... Following (Sun et al., 2020), we perform 10-fold cross validation.
Hardware Specification	Yes	All experiments are conducted on NVIDIA GeForce RTX 3090 GPUs with 24 GB memory, TITAN Xp GPU machines equipped with 12 GB memory.
Software Dependencies	No	For the implementation, we utilize Network X, Pytorch, and Pytorch Geometric for model construction. ... We train all models with the Adam optimizer (Diederik & Ba, 2015)... While specific software libraries are mentioned (Network X, Pytorch, Pytorch Geometric), no version numbers are provided for these dependencies.
Experiment Setup	Yes	The hyper-parameter ranges we used for tuning on each dataset are as follows: Number of layers: {1, 2, 3}; Number of Fourier series expansion terms: {16, 32, 64}; Number of heads: {1, 2, 3, 4, 5}; Hidden dimension: {64, 128}; Learning rate: {0.01, 0.005}; Number of K: {1, 2, 3, 4, 5, 6, 10}; Weight decays: {5e-3, 5e-4, 5e-5}; Dropout rates: {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8}. ... Hyperparameter selection range is as follows: Number of layers: {1, 2}; Epoch: {100, 200, 300}; Learning rates: {0.01, 0.005, 0.001}; Weight decay: {0.0, 0.0005, 0.00005}; Dropout rate: {0.0, 0.05, 0.1}; Number of Fourier series expansion terms: {16, 32, 64}; Hidden dim: {32, 64, 128}; Number of K: {1, 2, 3, 4, 5, 6}; Batch size: {128}; Internal MPGNN: {GCN, Gated GCN(Bresson & Laurent, 2017)}.