Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Walking Out of the Weisfeiler Leman Hierarchy: Graph Learning Beyond Message Passing

Authors: Jan Tönshoff, Martin Ritzert, Hinrikus Wolf, Martin Grohe

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that CRaWl matches state-of-the-art GNN architectures across a multitude of benchmark datasets for classification and regression on graphs. We carry out a comprehensive experimental analysis, demonstrating that empirically CRaWl is on par with advanced message passing GNN architectures and graph transformers on major graph learning benchmark datasets and excels when it comes to long-range interactions.
Researcher Affiliation	Academia	Jan Tönshoff EMAIL RWTH Aachen University Martin Ritzert EMAIL Georg-August-Universität Göttingen Hinrikus Wolf EMAIL RWTH Aachen University Martin Grohe EMAIL RWTH Aachen University
Pseudocode	No	The paper describes the architecture and processes, and Figure 1 depicts an example of the information flow, but it does not contain a formal pseudocode block or algorithm steps explicitly labeled as such.
Open Source Code	Yes	We implemented CRaWl in PyTorch (Paszke et al., 2019; Fey & Lenssen, 2019)1. 1https://github.com/toenshoff/CRaWl
Open Datasets	Yes	We evaluate CRaWl on a range of standard graph learning benchmark datasets obtained from Dwivedi et al. (2020), Hu et al. (2020), and Dwivedi et al. (2022). From the OGB project (Hu et al., 2020), we use the molecular property prediction dataset MOLPCBA... Further, we use four datasets from Dwivedi et al. (2020). The first dataset ZINC... The datasets CIFAR10 and MNIST... The last dataset CSL... We conduct additional experiments on the long-range graph benchmark (Dwivedi et al., 2022)... PASCALVOC-S... PEPTIDES-FUNC and PEPTIDES-STRUCT...
Dataset Splits	Yes	On MOLPCBA, the performance is measured in terms of the average precision (AP). Additionally, it provides a train/val/test split that separates structurally different types of molecules for a more realistic experimental setting... For each dataset we report the mean and standard deviation across several models trained with different random seeds. We follow the standardized procedure for each dataset and average over 10 models for MOLPCBA, 5 models for ZINC, CIFAR10, and MNIST and 4 models for PASCALVOC-SP, PEPTIDESFUNC, and PEPTIDES-STRUCT. ... Unlike the other benchmark datasets provided by Dwivedi et al. (2020), CSL is evaluated with 5-fold cross-validation. We use the 5-fold split Dwivedi et al. (2020) provide in their repository.
Hardware Specification	Yes	All experiments were run on a machine with 64GB RAM, an Intel Xeon 8160 CPU and an Nvidia Tesla V100 GPU with 16GB GPU memory.
Software Dependencies	No	We implemented CRaWl in PyTorch (Paszke et al., 2019; Fey & Lenssen, 2019)1. The specific version number for PyTorch is not provided in the text.
Experiment Setup	Yes	We adopt the training procedure specified by Dwivedi et al. (2020). In particular, the learning rate is initialized as 10^-3 and decays with a factor of 0.5 if the performance on the validation set stagnates for 10 epochs. The training stops once the learning rate falls below 10^-6. Dwivedi et al. (2020) also specify that networks need to stay within parameter budgets of either 100k or 500k parameters. For all datasets we use a walk length of ℓ= 50 during training. For evaluation we increase this number to ℓ= 150... The window size s was chosen to be 8 for all but the long-range datasets where we increased it to 16... Table 5 provides the hyperparameters used in each experiment.