Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Model Reconciliation via Cost-Optimal Explanations in Probabilistic Logic Programming

Authors: Yinxu Tang, Stylianos Loukas Vasileiou, Vincent Derkinderen, William Yeoh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach is validated through a user study on explanation types and computational experiments showing that the optimized version consistently outperforms the generic baseline.
Researcher Affiliation	Academia	Yinxu Tang Washington University in St. Louis EMAIL Stylianos Loukas Vasileiou New Mexico State University EMAIL Vincent Derkinderen KU Leuven EMAIL William Yeoh Washington University in St. Louis EMAIL
Pseudocode	Yes	Algorithm 1 Generic Search Algorithm for Cost-Optimal Explanations
Open Source Code	Yes	The human-subject study, collected data, and implementation are released on https://github.com/YODA-Lab/ProbLog-Model-Reconciliation.
Open Datasets	Yes	The human-subject study, collected data, and implementation are released on https://github.com/YODA-Lab/ProbLog-Model-Reconciliation.
Dataset Splits	No	We generate 100 Agent-Human Model pairs for each configuration, totaling 1,600 pairs (4 Agent settings 4 complexity levels 100 repetitions) for each case in Definition 1.
Hardware Specification	Yes	All experiments were run on a Mac Book Pro (M2, 16GB RAM).
Software Dependencies	No	Our approach is built on probabilistic logic programming (PLP) using Prob Log, where explanations are generated as cost-optimal model updates that reconcile these probabilistic differences.
Experiment Setup	Yes	Experimental Setup. Our experiments use two models: Agent and Human. Agent Model Ma: Each Ma contains \|Fa\| = 10, 20, 100, or 1000 probabilistic facts and \|Ra\| = 5, 10, 50, or 500 rules, respectively, all related to the same query. Facts have randomly assigned probabilities, and rules have bodies of 2-4 literals, generated based on cases in Definition 1. Human Model Mh: Derived from each Ma at four complexity levels l {20%, 40%, 60%, 80%}, reflecting the percentage of probabilistic facts that differ. Each differing fact has a 1/3 chance of being: modified (probability flipped), removed, or replaced (new fact). Human model rules share the same heads as the agent model but are built using existing facts, with the rule count as: \|Rh\| = \|Ra\| (1 1/3 l) . We generate 100 Agent-Human Model pairs for each configuration, totaling 1,600 pairs (4 Agent settings 4 complexity levels 100 repetitions) for each case in Definition 1. Evaluation Metrics. All experiments are capped at 600 seconds per run.