Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Embeddings as Probabilistic Equivalence in Logic Programs

Authors: Jaron Maene, Efthymia Tsamoura

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on well-known benchmarks show that the equivalence semantics leads to neurosymbolic models with up to 42% higher results than state-of-the-art baselines. Our empirical analysis against state-of-the-art techniques that learn and reason over symbol embeddings relying on soft unification [Rocktäschel and Riedel, 2017, Minervini et al., 2020a, Maene and De Raedt, 2023] shows that our semantics leads to models with up to 42% higher accuracy over the state-of-the-art. In Section 6, we assess the effect of our semantics on two settings where embeddings and rules need to be learnt jointly. First, we consider link prediction in knowledge graphs. Second, we consider differentiable finite state machines.
Researcher Affiliation	Collaboration	Jaron Maene KU Leuven Leuven, Belgium EMAIL Efthymia Tsamoura Huawei Labs Cambridge, United Kingdom EMAIL
Pseudocode	Yes	Algorithm 1: Exact Inference Input: Program P = (F, R), equivalence distribution PE as in (9), and target p-fact α. Output: Probability P(α) ... Algorithm 2: Approximate Inference Input: Program P = (F, R), equivalence distribution PE as in (9), target p-fact α, and number of samples k. Output: Approximation of P(α).
Open Source Code	Yes	Supplementary material, including all code and proofs, is available at https://github.com/ ML-KULeuven/equality_reasoning. The source code to replicate all experiments will be released upon publication.
Open Datasets	Yes	We use two well-known small knowledge graphs: countries [Bouchard et al., 2015] and nations [Rummel, 1992]. ... The countries (ODb L licence) and nations (CC0 license) knowledge graphs, grammars (CC0 license), and MNIST dataset (MIT license) are all publicly available.
Dataset Splits	Yes	We adopt the same splits as Rocktäschel and Riedel [2017] and Minervini et al. [2020b]. ... To test generalization, the test sequences have double the length and use images disjoint from the training split. ... For each fact p(a, b) in the test dataset, we take all possible corrupted facts by replacing one of the two arguments, creating p(a , b) and p(a, b ), and filter out any corrupted facts that appear in the knowledge graph.
Hardware Specification	Yes	We performed the experiments on servers with an Intel i7-12700 CPU and 64GB RAM, although lower resources may suffice. No GPU or TPU compute was used.
Software Dependencies	No	The paper mentions using the GLog Datalog engine and Adam optimizer but does not specify their version numbers or any other software dependencies with explicit version details, which are required for a 'Yes' classification.
Experiment Setup	Yes	All hyperparameters are summarized in Appendix B. As baselines, we compare with prior work that reasons on embeddings in logic programs. ... Table 5: Hyperparameters used in the countries experiment (c.f. Table 1). ... Table 6: Hyperparameters used in the nations experiment (c.f. Table 1). ... Table 7: Hyperparameters used in the differentiable finite state machines experiment (c.f. Table 2).