Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Enhancing Datalog Reasoning with Hypertree Decompositions
Authors: Xinyue Zhang, Pan Hu, Yavor Nenov, Ian Horrocks
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation shows that, when the program contains complex rules, the combined approach is usually significantly faster than the baseline approach, sometimes by orders of magnitude. |
| Researcher Affiliation | Collaboration | 1 Department of Computer Science, University of Oxford, Oxford, UK 2 School of Electrical Information and Electronic Engineering, Shanghai Jiao Tong University, China 3 Oxford Semantic Techonologies, Oxford, UK |
| Pseudocode | Yes | Algorithm 1 MAT(Π, E); Algorithm 2 DRed(Π, E, I, E+, E ); Algorithm 3 Addr[I, +]; Algorithm 4 Cross Node Evaluationr(L); Algorithm 5 Delr[I, ]; Algorithm 6 Redr[I, ] |
| Open Source Code | Yes | Our test system and data are available online.3 3https://xinyuezhang.xyz/HDReasoning/ |
| Open Datasets | Yes | We tested our system using the well-known LUBM and YAGO benchmarks [Guo et al., 2005; Suchanek et al., 2008] |
| Dataset Splits | Yes | We evaluate the running time of materialisation (the initial materialisation with all the explicitly given facts inserted as E+ in Algorithm 2), small deletions (randomly selecting 1,000 facts from the dataset as E in Algorithm 2), large deletions (randomly selecting 25% of the dataset as E ), and small additions (adding 1,000 facts as E+ into the dataset). |
| Hardware Specification | Yes | All of our experiments are conducted on a Dell Power Edge R730 server with 512GB RAM and 2 Intel Xeon E5-2640 2.60GHz processors, running Fedora 33, kernel version 5.10.8. |
| Software Dependencies | Yes | All of our experiments are conducted on a Dell Power Edge R730 server with 512GB RAM and 2 Intel Xeon E5-2640 2.60GHz processors, running Fedora 33, kernel version 5.10.8. |
| Experiment Setup | Yes | In our implementation, for (1), we use the standard textbook cardinality estimator described in Chapter 16.4 of the book [Garcia-Molina, 2008] to estimate the cardinality of Bi λ(p) Bi for a node p; for (2), we use 2 (| ˆpi| + | ˆpj|) to estimate the cost of performing semi-joins between nodes pi and pj, where | ˆpi| and | ˆpj| represent the estimated node size. Moreover, the extra step of full reducer we introduced in algorithm 4 (line 6) is more suitable for small updates, in which the node with the smallest size helps reduce other large nodes. If the size of all the nodes is comparable, then this step would be unnecessary. Therefore, in practice, we only perform this optimisation if the number of active instantiations in the node (i.e., pi) is more than three times smaller than the maximum number of active instantiations in each node. |