Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Systematic Approach to Universal Random Features in Graph Neural Networks

Authors: Billy Joe Franks, Markus Anders, Marius Kloft, Pascal Schweitzer

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a new comprehensive framework that captures all previous URF techniques. On the theoretical side, among other results, we formally prove that under natural conditions all instantiations of our framework are universal. The framework thus provides a new simple technique to prove universality results. On the practical side, we develop a method to systematically and automatically train URF. This in turn enables us to impartially and objectively compare all existing URF. New URF naturally emerge from our approach, and our experiments demonstrate that they improve the state of the art.
Researcher Affiliation	Academia	Billy Joe Franks EMAIL Department of Computer Science University of Kaiserslautern-Landau (RPTU) Markus Anders Department of Mathematics TU Darmstadt Marius Kloft Department of Computer Science University of Kaiserslautern-Landau (RPTU) Pascal Schweitzer Department of Mathematics TU Darmstadt
Pseudocode	No	The paper describes methods and processes using mathematical formulas and descriptive text, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured, code-like steps for a procedure.
Open Source Code	No	We used python 3.8.10 to implement all the models and conduct all the experiments: dejavu-gi 0.1.3 (for IRNI(CR))... However, our experiments might not be perfectly reproducible as dejavu the package we use to calculate random IR paths does not allow for its seed to be set.
Open Datasets	Yes	We evaluate different models on datasets used in prior work on URF, specifically EXP, CEXP, TRI, TRIX, CSL, PROTEINS, MUTAG, and NCI1 (Srinivasan et al.; Borgwardt et al., 2005; Wale & Karypis, 2006; Murphy et al., 2019; Sato et al., 2021; Abboud et al., 2021).
Dataset Splits	Yes	To estimate the performance, we use Monte Carlo cross-validation in an outer test loop and an inner validation loop estimating nested 10 9-fold cross-validation... Each dataset is split using stratified 10-fold cross-validation with random shuffling. The first fold is used as the test set and the 9 others are used for bayesian hyperparameter optimization... In the inner loop, which we referred to before as just bayesian hyperparameter optimization, the data is split using stratified 9-fold cross-validation. The first fold is used as a validation set and the other 8 folds are used to train the model, after which its performance is reported on the validation set. This is repeated 3 times with different random shuffles.
Hardware Specification	Yes	The system that was used to do the experiments mentioned in this work is made up of: 2 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz 754Gi B System memory 10 Ge Force RTX 2080 Ti
Software Dependencies	Yes	We used python 3.8.10 to implement all the models and conduct all the experiments: dejavu-gi 0.1.3 (for IRNI(CR)) networkx 2.6.3 (for constructing TRI and TRIX) numpy 1.21.4 scikit-learn 1.0.1 scikit-optimize 0.9.0 (for Bayesian hyperparameter optimization) scipy 1.7.3 torch 1.10.0 torch-geometric 2.0.2 (specifically for graph related machine learning)
Experiment Setup	Yes	For all experiments, we use the same general architecture, the Adam optimizer, and use the area under the receiver operating characteristic (AUROC). We optimize each method using a bayesian hyperparameter search in the same hyperparameter space. To estimate the performance, we use Monte Carlo cross-validation in an outer test loop and an inner validation loop estimating nested 10 9-fold cross-validation. The bayesian hyperparameter search is capped at evaluating 50 points in hyperparameter space. To encourage the models to optimize faster as well as to avoid overfitting, we add a penalty to the AUROC estimate based on some hyperparameters. The reported test AUROC does not include these penalties.