Column-Oriented Datalog Materialization for Large Knowledge Graphs

Authors: Jacopo Urbani, Ceriel Jacobs, Markus Krötzsch

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation shows that this approach can often match or even surpass the performance of state-of-the-art systems, especially under restricted resources. We evaluate a prototype implementation or our approach. Evaluation results show that our approach can significantly reduce the amount of main memory needed for materialization, while maintaining competitive runtimes.
Researcher Affiliation Academia Jacopo Urbani Dept. Computer Science VU University Amsterdam Amsterdam, The Netherlands jacopo@cs.vu.nl; Ceriel Jacobs Dept. Computer Science VU University Amsterdam Amsterdam, The Netherlands c.j.h.jacobs@vu.nl; Markus Krötzsch Faculty of Computer Science Technische Universität Dresden Dresden, Germany markus.kroetzsch@tu-dresden.de
Pseudocode No The paper describes its procedural steps, particularly for semi-naive evaluation and optimizations. However, it does not include a clearly labeled "Pseudocode" or "Algorithm" block or figure.
Open Source Code Yes Our source code and a short tutorial is found at https://github.com/jrbn/vlog.
Open Datasets Yes We used largely the same data that was also used to evaluate RDFox (Motik et al. 2014). Datasets and Datalog programs are available online.2 (Footnote 2: http://www.cs.ox.ac.uk/isg/tools/RDFox/2014/AAAI/Data/Rules). The datasets we used are the cultural-heritage ontology Claros (Motik et al. 2014), the DBpedia KG extracted from Wikipedia (Bizer et al. 2009), and two differently sized graphs generated with the LUBM benchmark (Guo, Pan, and Heflin 2005).
Dataset Splits No The paper uses various datasets for evaluation but does not specify any training, validation, or test splits (e.g., percentages or sample counts) for these datasets. It refers to them as full datasets used for materialization.
Hardware Specification Yes The computer used in all experiments is a Macbook Pro with a 2.2GHz Intel Core i7 processor, 512GB SDD, and 16GB RAM running on Mac OS Yosemite OS v10.10.5.
Software Dependencies Yes All software (ours and competitors) was compiled from C++ sources using Apple CLang/LLVM v6.1.0.
Experiment Setup Yes In the "Experimental Setup" section, the paper describes system-level settings for VLog, such as "VLog was always used with dynamic optimizations activated but without memoization" and the "timeout (default 1 sec)" for memoization. It also details the specific versions of competitor software used.