Inferring Same-As Facts from Linked Data: An Iterative Import-by-Query Approach

Authors: Mustafa Al-Bakri, Manuel Atencia, Steffen Lalande, Marie-Christine Rousset

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data. We describe a novel import-by-query algorithm that alternates steps of sub-query rewriting and of tailored querying the Linked Data cloud in order to import data as specific as possible for inferring or contradicting given target same-as facts. Experiments conducted on a real-world dataset have demonstrated the feasibility of this approach and its usefulness in practice for data linkage and disambiguation.
Researcher Affiliation Collaboration 1 Univ. Grenoble Alpes, LIG, 38000, Grenoble, France 2 CNRS, LIG, 38000, Grenoble, France 3 Inria, 38330, Montbonnot-Saint-Martin, France 4 Institut National de l Audiovisuel, 94366, Bry-sur-Marne, France 5 Institut Universitaire de France, 75005, Paris, France
Pseudocode No The paper describes the Import-by-Query and QESQ algorithms in text, but it does not include a formally structured pseudocode block or algorithm listing.
Open Source Code No The paper states 'Our algorithms have been implemented in SWI-Prolog.' but does not provide a specific link or explicit statement for the open-source code of their implementation. It provides links for rules and sample data, but not the algorithm's source code.
Open Datasets Yes The external datasets from Linked Open Data with which the INA vocabulary shares terms are DBpedia.org and DBpedia.fr. [...] For copyright reasons, we are not allowed to expose the whole INA dataset. However, 100 of the 500 facts from the sample, and corresponding INA data, may be found at http://goo.gl/amm1f J and http://goo.gl/z Brq H5.
Dataset Splits No The paper describes the datasets used (INA dataset, DBpedia facts) but does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification Yes All the evaluations were done on a machine with an Intel i7 Quad-core processor and 6 GB of memory.
Software Dependencies No The paper states 'Our algorithms have been implemented in SWI-Prolog.' This names the software environment but does not provide specific version numbers for any libraries or dependencies.
Experiment Setup Yes We have conducted experiments on a real deductive dataset composed of 35 rules and 6 million RDF facts from INA dataset. The rules may be found at http://goo.gl/Nf R12w. [...] In all our experiments we used edit distance and 0.99 as a threshold.