Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SpeqNets: Sparsity-aware permutation-equivariant graph networks

Authors: Christopher Morris, Gaurav Rattan, Sandra Kiefer, Siamak Ravanbakhsh

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we aim to empirically investigate the learning performance of the kernel, see Appendix B.1, and neural architectures, see Section 4, based on the (k, s)-LWL, compared with standard kernel and (higher-order) GNN baselines. Concretely, we aim to answer the following questions. Q1 Do the (k, s)-LWL-based algorithms, both kernel and neural architectures, lead to improved classiﬁcation and regression scores on real-world, graph-level benchmark datasets compared with dense algorithms and standard baselines? Q2 How does the (k, s)-Speq Net architecture compare to standard GNN baselines on node-classiﬁcation tasks? Q3 To what extent does the (k, s)-LWL reduce computation times compared with architectures induced by the k-WL? Q4 What is the effect of k and s with respect to computation times and predictive performance? The source code of all methods and evaluation procedures is available at https://www.github.com/ chrsmrrs/speqnets.
Researcher Affiliation	Academia	Christopher Morris 1 2 3 Gaurav Rattan 1 Sandra Kiefer 4 Siamak Ravanbkash 2 3 1Department of Computer Science, RWTH Aachen University, Aachen, Germany 2Department of Computer Science, Mc Gill University, Montreal, Canada 3Mila, Quebec AI Institute 4Max Planck Institute for Software Systems, Saarland Informatics Campus, Germany. Correspondence to: Christopher Morris <EMAIL>.
Pseudocode	Yes	Algorithm 1 Generate (k, s)-multisets
Open Source Code	Yes	The source code of all methods and evaluation procedures is available at https://www.github.com/ chrsmrrs/speqnets.
Open Datasets	Yes	To compare the (k, s)-LWL-based kernels, we used the well-known graph-classiﬁcation benchmark datasets from (Morris et al., 2020a), see Table 3 for dataset statistics and properties. ... To compare the (k, s)-Speq Net architecture with GNN baselines, we used the ALCHEMY (Chen et al., 2019a) and the QM9 (Ramakrishnan et al., 2014; Wu et al., 2018) graph regression datasets, again see Table 1 for dataset statistics and properties. ... All datasets are publicly available at www. graphlearning.io. ... For both datasets, we uniformly and at random sampled 80% of the graphs for training, and 10% for validation and testing, respectively.
Dataset Splits	Yes	Following the evaluation method proposed in (Morris et al., 2020a), the C-parameter was selected from {10 3, 10 2, . . . , 102, 103} using a validation set sampled uniformly at random from the training fold (using 10% of the training fold). Similarly, the numbers of iterations of the (k, s)-LWL, (k, s)-LWL+, 1-WL, WLOA, δ-k-LWL, δ-k LWL+, and k-WL were selected from {0, . . . , 5} using the validation set. ... For both datasets, we uniformly and at random sampled 80% of the graphs for training, and 10% for validation and testing, respectively. ... We used the provided ten training, validation, and test splits for the node-classiﬁcation datasets.
Hardware Specification	Yes	All kernel experiments were conducted on a workstation with 791GB of RAM using a single core. ... All neural experiments were conducted on a workstation with one GPU card with 32GB of GPU memory.
Software Dependencies	Yes	All kernels were (re-)implemented in C++11. ... Moreover, we used the GNU C++ Compiler 4.8.5 with the ﬂag -O2. ... We implemented them using PYTORCH GEOMETRIC (Fey & Lenssen, 2019), using a Python-wrapped C++11 preprocessing routine.
Experiment Setup	Yes	Following the evaluation method proposed in (Morris et al., 2020a), the C-parameter was selected from {10 3, 10 2, . . . , 102, 103} using a validation set sampled uniformly at random from the training fold (using 10% of the training fold). Similarly, the numbers of iterations of the (k, s)-LWL, (k, s)-LWL+, 1-WL, WLOA, δ-k-LWL, δ-k LWL+, and k-WL were selected from {0, . . . , 5} using the validation set. ... The number of components of the (hidden) node features in {32, 64, 128} and the number of layers in {1, 2, 3, 4, 5} of the GIN and GIN-ε layer were again selected using a validation set sampled uniformly at random from the training fold (using 10% of the training fold). ... We used a 2-layer MLP for the ﬁnal classiﬁcation, using a dropout layer with p = 0.5 after the ﬁrst layer of the MLP. ... we used six layers with 64 (hidden) node features and a set2seq layer (Vinyals et al., 2016) for graph-level pooling, followed by a 2-layer MLP for the joint regression of the twelve targets.