KOGNAC: Efficient Encoding of Large Knowledge Graphs

Authors: Jacopo Urbani, Sourav Dutta, Sairam Gurajada, Gerhard Weikum

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated KOGNAC in combination with state-of-the-art RDF engines, and observed that it significantly improves SPARQL querying on KGs with up to 1B edges.
Researcher Affiliation Academia Jacopo Urbani,a Sourav Dutta,b Sairam Gurajada,b and Gerhard Weikumb a VU University Amsterdam, The Netherlands b Max Planck Institute for Informatics, Germany
Pseudocode Yes Algorithm 1: Locality-aware Encoding: Input: a KG, the taxonomy T, and the frequent dictionary Dfreq generated by FBE. Output: The infrequent dictionary Dinfreq.
Open Source Code Yes The KOGNAC code is available at https://github.com/jrbn/kognac.
Open Datasets Yes As input, we used three RDF graphs in NT format: LUBM [Guo et al., 2005] a popular benchmark tool, LDBC [Angles et al., 2014], another, more recent benchmark designed for advanced SPARQL 1.1 workloads, and DBPedia [Bizer et al., 2009], one of the most popular KGs.
Dataset Splits No The paper describes the datasets used for evaluation (LUBM, LDBC, DBPedia) and their sizes, but does not provide details on specific training, validation, or test splits for model training as this is an encoding paper, not a machine learning model training paper.
Hardware Specification Yes We used two types of machines: M1, a dual 8-core 2.4 GHz Intel CPU, 64 GB RAM, and two disks of 4 TB in RAID-0; and M2, a 16 quad-core Intel Xeon CPUs of 2.4GHz with 48GB of RAM.
Software Dependencies No The paper mentions software like 'C++ prototype' and specific RDF engines (RDF-3X, Triple Bit, Tri AD, Monet DB), but does not provide specific version numbers for these or any other ancillary software components, making exact replication of the software environment difficult.
Experiment Setup Yes KOGNAC receives G and a threshold value k (used for the top-k frequent elements)... In our implementation, we selected as default values three hash functions for H, while m is the number of physical cores, and the value k is requested from the user. ... Therefore, all experiments in this section should be intended with k = 50.