Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Embeddings as Probabilistic Equivalence in Logic Programs

Authors: Jaron Maene, Efthymia Tsamoura

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on well-known benchmarks show that the equivalence semantics leads to neurosymbolic models with up to 42% higher results than state-of-the-art baselines. Our empirical analysis against state-of-the-art techniques that learn and reason over symbol embeddings relying on soft unification [Rocktäschel and Riedel, 2017, Minervini et al., 2020a, Maene and De Raedt, 2023] shows that our semantics leads to models with up to 42% higher accuracy over the state-of-the-art. In Section 6, we assess the effect of our semantics on two settings where embeddings and rules need to be learnt jointly. First, we consider link prediction in knowledge graphs. Second, we consider differentiable finite state machines.
Researcher Affiliation Collaboration Jaron Maene KU Leuven Leuven, Belgium EMAIL Efthymia Tsamoura Huawei Labs Cambridge, United Kingdom EMAIL
Pseudocode Yes Algorithm 1: Exact Inference Input: Program P = (F, R), equivalence distribution PE as in (9), and target p-fact α. Output: Probability P(α) ... Algorithm 2: Approximate Inference Input: Program P = (F, R), equivalence distribution PE as in (9), target p-fact α, and number of samples k. Output: Approximation of P(α).
Open Source Code Yes Supplementary material, including all code and proofs, is available at https://github.com/ ML-KULeuven/equality_reasoning. The source code to replicate all experiments will be released upon publication.
Open Datasets Yes We use two well-known small knowledge graphs: countries [Bouchard et al., 2015] and nations [Rummel, 1992]. ... The countries (ODb L licence) and nations (CC0 license) knowledge graphs, grammars (CC0 license), and MNIST dataset (MIT license) are all publicly available.
Dataset Splits Yes We adopt the same splits as Rocktäschel and Riedel [2017] and Minervini et al. [2020b]. ... To test generalization, the test sequences have double the length and use images disjoint from the training split. ... For each fact p(a, b) in the test dataset, we take all possible corrupted facts by replacing one of the two arguments, creating p(a , b) and p(a, b ), and filter out any corrupted facts that appear in the knowledge graph.
Hardware Specification Yes We performed the experiments on servers with an Intel i7-12700 CPU and 64GB RAM, although lower resources may suffice. No GPU or TPU compute was used.
Software Dependencies No The paper mentions using the GLog Datalog engine and Adam optimizer but does not specify their version numbers or any other software dependencies with explicit version details, which are required for a 'Yes' classification.
Experiment Setup Yes All hyperparameters are summarized in Appendix B. As baselines, we compare with prior work that reasons on embeddings in logic programs. ... Table 5: Hyperparameters used in the countries experiment (c.f. Table 1). ... Table 6: Hyperparameters used in the nations experiment (c.f. Table 1). ... Table 7: Hyperparameters used in the differentiable finite state machines experiment (c.f. Table 2).