reproducibilityindex.ai

Machine Learning and Constraint Programming for Relational-To-Ontology Schema Mapping

Authors: Diego De Uña, Nataliia Rümmele, Graeme Gange, Peter Schachte, Peter J. Stuckey

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Combining ML and CP achieves state-of-the-art precision, recall and speed, and provides a more ﬂexible framework for variations of the problem. We run experiments on two domains: museum... and soccer...
Researcher Affiliation	Collaboration	Diego De U na1, Nataliia R ummele2, Graeme Gange1, Peter Schachte1 and Peter J. Stuckey1,3 1Department of Computing and Information Systems The University of Melbourne 2Siemens, Germany 3Data61, CSIRO, Melbourne, Australia
Pseudocode	No	The paper describes the Constraint Programming model with equations and variable definitions but does not present a formal pseudocode or algorithm block.
Open Source Code	Yes	SERENE1 http://github.com/NICTA/serene-python-client/tree/stp/stp
Open Datasets	No	We run experiments on two domains: museum (29 sources, 20 labels, 443 semantic attributes and 159 unknown attributes) and soccer (12 sources, 18 labels, 138 attributes and 45 unknowns). The paper describes the datasets but does not provide access information (link, DOI, or citation for public availability).
Dataset Splits	Yes	We perform an evaluation strategy outlined by Taheriyan et al. [2016a]. Let Mj be the set of j known semantic models. For each data source si in the domain we perform experiments t 1 times, where t is the total number of data sources in the domain and each experiment has a different number of known semantic models M1, M2, . . . , Mt 1. For example, in the soccer domain, for source s1 we run experiments 11 times using M1 = {m2}, M2 = {m2, m3}, . . . , M11 = {m2, m3, . . . , m12}. We repeat the procedure for all sources in the domain and then average the results. This ensures that each source is at least once in the training and testing datasets.
Hardware Specification	Yes	We have run all our experiments on a Dell server with 252 GB of RAM, 2 CPUs (4 cores).
Software Dependencies	No	We used the MINIZINC language [Nethercote et al., 2007], and the CHUFFED solver [Chu, 2011]. Specific version numbers for MiniZinc or CHUFFED are not provided.
Experiment Setup	Yes	We use default parameters for KARMA which were shown to yield the best results. We use a timeout threshold of 15s for CHUFFED, which runs on a single core. In Fig. 5 we use a scaling factor 1 for pattern costs. When trying scaling factors 5, 10 or 20 for pattern costs from the museum domain we can generate semantic models which are almost 90% in precision and recall.