Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Logical Expressiveness of Graph Neural Networks with Hierarchical Node Individualization

Authors: Arie Soeteman, Balder ten Cate

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results confirm the practical feasibility of HE-GNNs and show benefits in comparison with traditional GNN architectures, both with and without local homomorphism count features. ... Section 6: Experiments ... Table 1: Mean absolute error on ZINC-12k, test scores after 10 runs.
Researcher Affiliation	Academia	Arie Soeteman1 Balder ten Cate1 1Institute for Logic, Language and Computation University of Amsterdam
Pseudocode	Yes	1: function run A(G, emb) 2: emb0 := emb 3: for i = 1, . . . , L do 4: embi := {v : COMi(embi 1(v) AGGi({{embi 1(u) \| (v, u) E}})) \| v V } 5: return emb L
Open Source Code	Yes	All code used for the experiments is available on git.2 2https://github.com/ariesoeteman/HEGNN
Open Datasets	Yes	We apply HES-GNN to ZINC-12k [17, 28]... ZINC-12k is publicly available, and the description of the synthetic dataset should be sufficient for reproduction. All used data is publicly available.
Dataset Splits	No	Details such as data splits and hyperparameters are not mentioned in the paper itself, but all details are provided in the form of code on git.
Hardware Specification	Yes	We use feature dimension 256 for all models for a maximum of 1000 epochs on a single 20GB gpu.
Software Dependencies	No	The paper does not explicitly list software dependencies with specific version numbers in the main text.
Experiment Setup	Yes	We use feature dimension 256 for all models for a maximum of 1000 epochs on a single 20GB gpu. ... Table 1 shows the achieved mean absolute error after 10 runs and validation score selection. ... Experiments run with batch size 20 and feature dimension 256.