Computing the Why-Provenance for Datalog Queries via SAT Solvers

Authors: Marco Calautti, Ester Livshits, Andreas Pieris, Markus Schneider

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we experimentally confirm the efficiency of our approach with a dedicated benchmark, and we further show that, in general, it outperforms the machinery proposed in (Elhalawati, Kr otzsch, and Mennicke 2022). Experimental Evaluation
Researcher Affiliation Academia 1Department of Computer Science, University of Milan 2School of Informatics, University of Edinburgh 3Department of Computer Science, University of Cyprus
Pseudocode No The paper describes methods and processes conceptually and mathematically, but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The experimental scenarios and the source code are given at https://gitlab.com/aaai24whyprov/datalog-why-provenance.
Open Datasets No The paper mentions database names like Dbitcoin, Dfacebook, Doctors-i, Galen, Andersen, and CSDA in Table 1, stating they are 'from the Datalog literature' but does not provide specific links, DOIs, or formal citations for public access to these datasets themselves.
Dataset Splits No The paper evaluates the performance of the why-provenance computation on different databases and selected tuples, but it does not specify training, validation, or test dataset splits in the context of machine learning model training.
Hardware Specification Yes All the experiments have been conducted on a laptop with an Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz, and 32GB of RAM, running Fedora Linux 37.
Software Dependencies Yes We used Python 3.11.2, and the C++ code has been compiled with g++ 12.2.1, using the -O3 optimization flag. To construct the downward closure we can exploit a state-of-the-art Datalog engine, that is, version 2.1.1 of DLV (Adrian et al. 2018). Finally, we ran the state-of-the-art SAT solver Glucose (see, e.g., (Audemard and Simon 2018)), version 4.2.1.
Experiment Setup Yes For each database of the scenario, we report a box-plot collecting all the runtimes relative to the 100 tuples associated to that database. Then, for each i [100], we constructed the downward closure of R( ti s,D) w.r.t. D and Σ by first computing the adapted query Q and database D via a Python 3 implementation and then using DLV for the actual computation of the downward closure. Then, we constructed the Boolean formula φ( ti s,D,D,Q) via a C++ implementation. Finally, we ran the state-of-the-art SAT solver Glucose... for our implementation, we employ the technique of vertex elimination (Rankooh and Rintanen 2022). We set a 5 minutes timeout for both approaches.