reproducibilityindex.ai

KOGNAC: Efficient Encoding of Large Knowledge Graphs

Authors: Jacopo Urbani, Sourav Dutta, Sairam Gurajada, Gerhard Weikum

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated KOGNAC in combination with state-of-the-art RDF engines, and observed that it signiﬁcantly improves SPARQL querying on KGs with up to 1B edges.
Researcher Affiliation	Academia	Jacopo Urbani,a Sourav Dutta,b Sairam Gurajada,b and Gerhard Weikumb a VU University Amsterdam, The Netherlands b Max Planck Institute for Informatics, Germany
Pseudocode	Yes	Algorithm 1: Locality-aware Encoding: Input: a KG, the taxonomy T, and the frequent dictionary Dfreq generated by FBE. Output: The infrequent dictionary Dinfreq.
Open Source Code	Yes	The KOGNAC code is available at https://github.com/jrbn/kognac.
Open Datasets	Yes	As input, we used three RDF graphs in NT format: LUBM [Guo et al., 2005] a popular benchmark tool, LDBC [Angles et al., 2014], another, more recent benchmark designed for advanced SPARQL 1.1 workloads, and DBPedia [Bizer et al., 2009], one of the most popular KGs.
Dataset Splits	No	The paper describes the datasets used for evaluation (LUBM, LDBC, DBPedia) and their sizes, but does not provide details on specific training, validation, or test splits for model training as this is an encoding paper, not a machine learning model training paper.
Hardware Specification	Yes	We used two types of machines: M1, a dual 8-core 2.4 GHz Intel CPU, 64 GB RAM, and two disks of 4 TB in RAID-0; and M2, a 16 quad-core Intel Xeon CPUs of 2.4GHz with 48GB of RAM.
Software Dependencies	No	The paper mentions software like 'C++ prototype' and specific RDF engines (RDF-3X, Triple Bit, Tri AD, Monet DB), but does not provide specific version numbers for these or any other ancillary software components, making exact replication of the software environment difficult.
Experiment Setup	Yes	KOGNAC receives G and a threshold value k (used for the top-k frequent elements)... In our implementation, we selected as default values three hash functions for H, while m is the number of physical cores, and the value k is requested from the user. ... Therefore, all experiments in this section should be intended with k = 50.