Inferring Same-As Facts from Linked Data: An Iterative Import-by-Query Approach
Authors: Mustafa Al-Bakri, Manuel Atencia, Steffen Lalande, Marie-Christine Rousset
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data. We describe a novel import-by-query algorithm that alternates steps of sub-query rewriting and of tailored querying the Linked Data cloud in order to import data as speciļ¬c as possible for inferring or contradicting given target same-as facts. Experiments conducted on a real-world dataset have demonstrated the feasibility of this approach and its usefulness in practice for data linkage and disambiguation. |
| Researcher Affiliation | Collaboration | 1 Univ. Grenoble Alpes, LIG, 38000, Grenoble, France 2 CNRS, LIG, 38000, Grenoble, France 3 Inria, 38330, Montbonnot-Saint-Martin, France 4 Institut National de l Audiovisuel, 94366, Bry-sur-Marne, France 5 Institut Universitaire de France, 75005, Paris, France |
| Pseudocode | No | The paper describes the Import-by-Query and QESQ algorithms in text, but it does not include a formally structured pseudocode block or algorithm listing. |
| Open Source Code | No | The paper states 'Our algorithms have been implemented in SWI-Prolog.' but does not provide a specific link or explicit statement for the open-source code of their implementation. It provides links for rules and sample data, but not the algorithm's source code. |
| Open Datasets | Yes | The external datasets from Linked Open Data with which the INA vocabulary shares terms are DBpedia.org and DBpedia.fr. [...] For copyright reasons, we are not allowed to expose the whole INA dataset. However, 100 of the 500 facts from the sample, and corresponding INA data, may be found at http://goo.gl/amm1f J and http://goo.gl/z Brq H5. |
| Dataset Splits | No | The paper describes the datasets used (INA dataset, DBpedia facts) but does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | Yes | All the evaluations were done on a machine with an Intel i7 Quad-core processor and 6 GB of memory. |
| Software Dependencies | No | The paper states 'Our algorithms have been implemented in SWI-Prolog.' This names the software environment but does not provide specific version numbers for any libraries or dependencies. |
| Experiment Setup | Yes | We have conducted experiments on a real deductive dataset composed of 35 rules and 6 million RDF facts from INA dataset. The rules may be found at http://goo.gl/Nf R12w. [...] In all our experiments we used edit distance and 0.99 as a threshold. |