Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Covered Forest: Fine-grained generalization analysis of graph neural networks

Authors: Antonis Vasileiou, Ben Finkelshtein, Floris Geerts, Ron Levie, Christopher Morris

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical study supports our theoretical insights, improving our understanding of MPNNs generalization properties. and 6. Experimental study
Researcher Affiliation	Academia	1RWTH Aachen University, Germany 2University of Oxford, UK 3University of Antwerp, Belgium 4Technion Israel Institute of Technology, Israel.
Pseudocode	No	The paper describes algorithms such as the 1-dimensional Weisfeiler Leman algorithm and its variants, but these descriptions are presented in prose and mathematical notation rather than dedicated pseudocode or algorithm blocks.
Open Source Code	Yes	See https://github.com/benfinkelshtein/ Covered Forests for source code and instructions to reproduce all results.
Open Datasets	Yes	Additionally, we experimented with the binary classification real-world datasets MUTAG, NCI1, MCF-7H (Morris et al., 2020a), and OGBG-MOLHIV (Hu et al., 2020)
Dataset Splits	Yes	We used a random 80/10/10 split for training/validation/testing.
Hardware Specification	Yes	All models were implemented with Py Torch Geometric (Fey & Lenssen, 2019) and executed on a system with 128GB of RAM and an Nvidia Tesla A100 GPU with 48GB of memory.
Software Dependencies	No	All models were implemented with Py Torch Geometric (Fey & Lenssen, 2019) and executed on a system with 128GB of RAM and an Nvidia Tesla A100 GPU with 48GB of memory. - While software is mentioned, specific version numbers for key libraries like PyTorch or CUDA are not provided. and Adam optimizer (Kingma & Ba, 2015)
Experiment Setup	Yes	We tuned the feature dimension across the set 32, 64, 128, 256 based on validation set performance, training MUTAG and NCI1 for 100 epochs and MCF-7H and OGBG-MOLHIV for 20 epochs using the Adam optimizer (Kingma & Ba, 2015). The training setup included a learning rate of 0.001, a batch size of 128, and no learning rate decay or dropout across all datasets.