Machine Learning and Constraint Programming for Relational-To-Ontology Schema Mapping
Authors: Diego De Uña, Nataliia Rümmele, Graeme Gange, Peter Schachte, Peter J. Stuckey
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Combining ML and CP achieves state-of-the-art precision, recall and speed, and provides a more flexible framework for variations of the problem. We run experiments on two domains: museum... and soccer... |
| Researcher Affiliation | Collaboration | Diego De U na1, Nataliia R ummele2, Graeme Gange1, Peter Schachte1 and Peter J. Stuckey1,3 1Department of Computing and Information Systems The University of Melbourne 2Siemens, Germany 3Data61, CSIRO, Melbourne, Australia |
| Pseudocode | No | The paper describes the Constraint Programming model with equations and variable definitions but does not present a formal pseudocode or algorithm block. |
| Open Source Code | Yes | SERENE1 http://github.com/NICTA/serene-python-client/tree/stp/stp |
| Open Datasets | No | We run experiments on two domains: museum (29 sources, 20 labels, 443 semantic attributes and 159 unknown attributes) and soccer (12 sources, 18 labels, 138 attributes and 45 unknowns). The paper describes the datasets but does not provide access information (link, DOI, or citation for public availability). |
| Dataset Splits | Yes | We perform an evaluation strategy outlined by Taheriyan et al. [2016a]. Let Mj be the set of j known semantic models. For each data source si in the domain we perform experiments t 1 times, where t is the total number of data sources in the domain and each experiment has a different number of known semantic models M1, M2, . . . , Mt 1. For example, in the soccer domain, for source s1 we run experiments 11 times using M1 = {m2}, M2 = {m2, m3}, . . . , M11 = {m2, m3, . . . , m12}. We repeat the procedure for all sources in the domain and then average the results. This ensures that each source is at least once in the training and testing datasets. |
| Hardware Specification | Yes | We have run all our experiments on a Dell server with 252 GB of RAM, 2 CPUs (4 cores). |
| Software Dependencies | No | We used the MINIZINC language [Nethercote et al., 2007], and the CHUFFED solver [Chu, 2011]. Specific version numbers for MiniZinc or CHUFFED are not provided. |
| Experiment Setup | Yes | We use default parameters for KARMA which were shown to yield the best results. We use a timeout threshold of 15s for CHUFFED, which runs on a single core. In Fig. 5 we use a scaling factor 1 for pattern costs. When trying scaling factors 5, 10 or 20 for pattern costs from the museum domain we can generate semantic models which are almost 90% in precision and recall. |