Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Persistent Weisfeiler-Lehman Procedure for Graph Classification
Authors: Bastian Rieck, Christian Bock, Karsten Borgwardt
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the following, we describe the practical performance of our methods on numerous graph classification benchmark data sets. |
| Researcher Affiliation | Academia | 1Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland. |
| Pseudocode | Yes | Algorithm 1 Persistent Subtree Feature Generation |
| Open Source Code | Yes | Please refer to our repository5 for the code and additional experiments. 5https://github.com/Borgwardt Lab/P-WL |
| Open Datasets | Yes | We use common graph benchmark data sets in our experiments, comprising graphs from chemoinformatics problems (Debnath et al. 1991), toxicology prediction (Helma et al. 2001), protein function/structure prediction (Borgwardt et al. 2005, Dobson & Doig 2003), carcinogenicity prediction (Wale et al. 2008), and social network analysis (Leskovec et al. 2005, Yanardag & Vishwanathan 2015). |
| Dataset Splits | Yes | We follow the standard setup for graph classification and perform a 10-fold cross-validation that we repeat 10 times, reporting the average and standard deviation for all runs. For hyperparameter tuning, we use an inner 5-fold cross-validation on each of the training splits to perform a grid search. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper states 'We implemented our methods in Python' but does not specify version numbers for Python or any other software dependencies or libraries. |
| Experiment Setup | Yes | For hyperparameter tuning, we use an inner 5-fold cross-validation on each of the training splits to perform a grid search. As for the hyperparameters, we choose p {1, 2} for P-WL and P-WL-C, and h {0, . . . , 10} for methods based on WL features, whereas we choose C {0.1, 1, 10} for training an SVM on the P-WL-D kernel values. Moreover, since we did not observe an effect in changing σ for P-WL-D, we leave σ = 1.0 fixed. |