Modular Materialisation of Datalog Programs

Authors: Pan Hu, Boris Motik, Ian Horrocks2859-2866

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have implemented our algorithms and compared them on several real-life and synthetic datasets. Our experiments illustrate the potential benefits of the proposed solution: our approach often outperforms state-of-the-art algorithms, sometimes by orders of magnitude.
Researcher Affiliation Academia Pan Hu, Boris Motik, Ian Horrocks Department of Computer Science, University of Oxford Oxford, United Kingdom firstname.lastname@cs.ox.ac.uk
Pseudocode Yes Algorithm 1 MAT(Π, λ, E), Algorithm 2 MAT-MOD(Π, λ, E), Algorithm 3 DREDc-MOD(Π, λ, E, I, E , E+, Cnr), Algorithm 4 Addtc(R)[Ip, In, ], Algorithm 5 Deltc(R)[Ip, In, , Cnr], Algorithm 6 Redtc(R)[Ip, In, ], Algorithm 7 Addstc(R)[Ip, In, ], Algorithm 8 Delstc(R)[Ip, In, , Cnr], Algorithm 9 Redstc(R)[Ip, In, ]
Open Source Code Yes Our system and test data are available online.1 http://krr-nas.cs.ox.ac.uk/2018/modular/
Open Datasets Yes We used the following real-world and synthetic benchmarks in our tests. LUBM (Guo, Pan, and Heflin 2005) ... DBpedia (Lehmann et al. 2015)...
Dataset Splits No The paper does not describe traditional machine learning train/validation/test splits for model training. It describes data usage for materialization and incremental updates.
Hardware Specification Yes We conducted all experiments on a Dell Power Edge R730 server with 512 GB RAM and two Intel Xeon E5-2640 2.6 GHz processors running Fedora 27, kernel version 4.17.6.
Software Dependencies No The paper mentions the operating system and kernel version ('Fedora 27, kernel version 4.17.6') but does not specify other software dependencies like libraries or compilers with version numbers that would be needed for reproduction.
Experiment Setup Yes In the first group, we tested the performance of our incremental algorithms on small changes. To this end, we used uniform sampling to select ten subsets Ei E, 1 i 10, each consisting of 1000 facts from the input dataset. We deleted and then reinserted Ei for each i while measuring the wall-clock times, and then we computed the average times for deletion and insertion over the ten samples.