Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

What Expressivity Theory Misses: Message Passing Complexity for GNNs

Authors: Niklas Kemper, Tom Wollschläger, Stephan Günnemann

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive validation on fundamental GNN tasks, we show that MPC s theoretical predictions correlate with empirical performance, successfully explaining architectural successes and failures. We extensively validate MPC, showing its consistency with empirical performance and its superiority over classical expressivity theory in explaining real-world MPNN behavior ( 5). For empirical validation, we train all model architectures on 2000 randomly generated 3-regular graphs with varying numbers of layers L. As shown in Figs. 3 and 9, the complexity measure based on iso expressivity theory, WLC, assigns constant zero complexity regardless of depth L, indicating only theoretical solvability. In contrast, MPC shows perfect negative Spearman correlation with accuracy for most architectures, capturing the increasing difﬁculty with L.
Researcher Affiliation	Academia	Niklas Kemper Tom Wollschläger Stephan Günnemann School of Computation, Information and Technology & Munich Data Science Institute Technical University of Munich EMAIL
Pseudocode	Yes	We show in Algorithm 1 for the exemplary task of propagating information from a source node u to a target node v how the complexities can be efﬁciently simulated using Monte Carlo simulation. Algorithm 1 Propagating Information Simulation
Open Source Code	Yes	Find our implementation at https://www.cs.cit.tum.de/daml/message-passing-complexity/. Additionally, we provide our code at https: //www.cs.cit.tum.de/daml/message-passing-complexity/.
Open Datasets	Yes	The peptides dataset of the lrgb [14] is released under a CC BY-NC 4.0 license. The ZINC dataset [15] is distributed under a custom license (free to use for everyone).
Dataset Splits	Yes	For our experiments on benchmark datasets, we use the standard pre-deﬁned splits. We use a training set of size 2000, a validation set of size 500 and a test set of size 2000. For each distance D, we use training sets of different sizes and a validation set of size 500 and a test set of size 2000. For each cycle size, we generate training sets of varying size, a validation set of size 1000 and a test set of size 10000.
Hardware Specification	Yes	All experiments are conducted on NVIDIA Ge Force GTX 1080 GPUs with 16GB memory allocation per job. Monte Carlo simulations for complexity calculations run on a single Intel Xeon E5-2630 v4 CPU (2.20GHz) and complete in under 10 seconds.
Software Dependencies	No	We use pytorch [39] and torch-geometric [17] (released under an MIT license) for the implementation of all models. For optimization, we use the Adam optimizer [30]. The training, validation and test sets containing random r-regular graphs are generated using networkx [24].
Experiment Setup	Yes	All experimental details including the type of optimizer, the model hyperparameters and how they were chosen are provided in App. D.1. For optimization, we use the Adam optimizer [30]. We found little difference between learning rates in {0.001, 0.005, 0.01} for all models and tasks. The shown results are for the learning rate 0.005. We train all models for all settings using three different seeds for a maximum of 50 epochs, showing the average results. Additionally, we use Batch Norm [26] for normalization.