Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Systematic Relational Reasoning With Epistemic Graph Neural Networks

Authors: Irtaza Khalid, Steven Schockaert

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that Epi GNNs achieve state-of-the-art results on link prediction tasks that require systematic reasoning. Furthermore, for inductive knowledge graph completion, Epi GNNs rival the performance of state-of-the-art specialized approaches. Finally, we introduce two new benchmarks that go beyond standard relational reasoning by requiring the aggregation of information from multiple paths. Here, existing neuro-symbolic approaches fail, yet Epi GNNs learn to reason accurately.
Researcher Affiliation	Academia	Irtaza Khalid & Steven Schockaert Cardiff University, UK EMAIL
Pseudocode	No	The paper describes the proposed Epi GNN model and related algorithms conceptually using mathematical formulas and textual descriptions, but it does not include a distinct, structured pseudocode block or algorithm listing.
Open Source Code	Yes	Code and datasets are available at https://github.com/erg0dic/gnn-sg.
Open Datasets	Yes	Code and datasets are available at https://github.com/erg0dic/gnn-sg. ... We introduce two new benchmarks: one based on RCC-8 and one based in IA. ... We release these benchmarks under a CC-BY 4.0 license.
Dataset Splits	Yes	For CLUTRR, RCC-8 and IA, to test for systematic generalization, models are trained on small graphs and subsequently evaluated on larger graphs. ... We use a standard 80-20 split for training and validation for CLUTRR and RCC-8. For Graphlog, we use the validation set that is provided separately from the test set. ... In inductive KGC, models are evaluated on a test graph which is disjoint from the training graph.
Hardware Specification	Yes	All experiments were conducted using RTX 4090 and V100 NVIDIA GPUs.
Software Dependencies	No	The paper mentions "We use the Adam optimizer (Kingma & Ba, 2017)" but does not specify version numbers for any programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	The number of layers of the Epi GNN model is fixed to 9 and the number of negative examples per instance is fixed as 1. The other hyperparameters of the Epi GNN model are tuned using grid search. The optimal values that were obtained are mentioned in Table 11. ... We conduct the following hyperparameter sweeps: learning rate in {0.00001, 0.001, 0.01, 0.1}, batch size in {16, 32, 64, 128}, number of facets m in {1, 2, 4, 8, 16, 32} and embedding dimension size in {8, 16, 32, 64, 128, 256}. We also tune the margin in the loss function over {10, 1.1, 1.0, 0.9, . . . , 0.1, 0.01}.