Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Model Reconciliation via Cost-Optimal Explanations in Probabilistic Logic Programming

Authors: Yinxu Tang, Stylianos Loukas Vasileiou, Vincent Derkinderen, William Yeoh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach is validated through a user study on explanation types and computational experiments showing that the optimized version consistently outperforms the generic baseline.
Researcher Affiliation Academia Yinxu Tang Washington University in St. Louis EMAIL Stylianos Loukas Vasileiou New Mexico State University EMAIL Vincent Derkinderen KU Leuven EMAIL William Yeoh Washington University in St. Louis EMAIL
Pseudocode Yes Algorithm 1 Generic Search Algorithm for Cost-Optimal Explanations
Open Source Code Yes The human-subject study, collected data, and implementation are released on https://github.com/YODA-Lab/ProbLog-Model-Reconciliation.
Open Datasets Yes The human-subject study, collected data, and implementation are released on https://github.com/YODA-Lab/ProbLog-Model-Reconciliation.
Dataset Splits No We generate 100 Agent-Human Model pairs for each configuration, totaling 1,600 pairs (4 Agent settings 4 complexity levels 100 repetitions) for each case in Definition 1.
Hardware Specification Yes All experiments were run on a Mac Book Pro (M2, 16GB RAM).
Software Dependencies No Our approach is built on probabilistic logic programming (PLP) using Prob Log, where explanations are generated as cost-optimal model updates that reconcile these probabilistic differences.
Experiment Setup Yes Experimental Setup. Our experiments use two models: Agent and Human. Agent Model Ma: Each Ma contains |Fa| = 10, 20, 100, or 1000 probabilistic facts and |Ra| = 5, 10, 50, or 500 rules, respectively, all related to the same query. Facts have randomly assigned probabilities, and rules have bodies of 2-4 literals, generated based on cases in Definition 1. Human Model Mh: Derived from each Ma at four complexity levels l {20%, 40%, 60%, 80%}, reflecting the percentage of probabilistic facts that differ. Each differing fact has a 1/3 chance of being: modified (probability flipped), removed, or replaced (new fact). Human model rules share the same heads as the agent model but are built using existing facts, with the rule count as: |Rh| = |Ra| (1 1/3 l) . We generate 100 Agent-Human Model pairs for each configuration, totaling 1,600 pairs (4 Agent settings 4 complexity levels 100 repetitions) for each case in Definition 1. Evaluation Metrics. All experiments are capped at 600 seconds per run.