Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Commute Graph Neural Networks

Authors: Wei Zhuo, Han Yu, Guang Tan, Xiaoxiao Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to evaluate the effectiveness of CGNN on eight digraph datasets. Experimental details and data statistics are provided in Appendix C.1 and Appendix C.2. Table 1 reports the node classification results across eight digraph datasets. Our method CGNN achieves new state-of-the-art results on 6 out of 8 datasets, and comparable results on the remaining ones, validating the superiority of CGNN. Figure 4 compares the accuracy of different models along with running times. Section 5.3 Component Analysis
Researcher Affiliation	Academia	Wei Zhuo 1 2 Han Yu 2 Guang Tan 1 Xiaoxiao Li 3 4 Most of this work was done when Wei Zhuo <EMAIL> was with Shenzhen Campus of Sun Yat-sen University. 1Shenzhen Campus of Sun Yat-sen University, China 2Nanyang Technological University, Singapore 3The University of British Columbia, Canada 4Vector Institute, Canada. Correspondence to: Guang Tan <EMAIL>.
Pseudocode	Yes	Algorithm 1 CGNN Input: Digraph G = (V, E, X); Depth L; Hidden size d ; Number of classes K Output: Logits ˆY RN K
Open Source Code	No	The paper does not explicitly state that the source code for the methodology is available, nor does it provide a link to a code repository.
Open Datasets	Yes	The datasets used in Section 5 are Squirrel, Chameleon (Rozemberczki et al., 2021), Citeseer (Sen et al., 2008), Cora ML (Bojchevski & G unnemann, 2017), AM-Photo (Shchur et al., 2018), Snap-Patents, Roman-Empire, and Arxiv-Year (Rossi et al., 2023).
Dataset Splits	Yes	For Squirrel and Chameleon, we use 10 public splits (48%/32%/20% for training/validation/testing) provided by (Pei et al., 2019). For the remaining datasets, we adopt the same splits as (Tong et al., 2020a; 2021), which chooses 20 nodes per class for the training set, 500 for the validation set, and allocates the rest to the test set.
Hardware Specification	Yes	We conduct our experiments on 2 Intel Xeon Gold 5215 CPUs and 1 NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) for its implementation.
Experiment Setup	Yes	We utilize the randomized truncated SVD algorithm for computing the Moore-Penrose pseudoinverse of matrix R, setting the required rank q to 5 for all datasets. The learning rate lr is selected from {0.01, 0.005}, and the weight decay wd from {0, 5e 5, 5e 4}. In the model architecture, the number of layers L vary among {1, 2, 3, 4, 5} and the dimension d is selected from {32, 64, 128, 256, 512}. The comprehensive hyperparameter configurations for CGNN are detailed in Table 8.