Sub-Merge: Diving Down to the Attribute-Value Level in Statistical Schema Matching
Authors: Zhe Lim, Benjamin Rubinstein1791
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the superior statistical and computational performance of multiple sparse CCA compared to a suite of baseline algorithms, on two datasets which we are releasing to stimulate further research. |
| Researcher Affiliation | Collaboration | Zhe Lim and Benjamin I. P. Rubinstein Department of Computing and Information Systems The University of Melbourne, Australia zhe@infinitelooplabs.com brubinstein@unimelb.edu.au |
| Pseudocode | Yes | Algorithm 1 One Vs All CCA post-processing |
| Open Source Code | No | The paper states: “To foster further research on this problem, we are releasing with this paper two new manually-labeled datasets1...”. Footnote 1 provides a URL for the datasets, but not for the source code of the methodology described in the paper. |
| Open Datasets | Yes | To foster further research on this problem, we are releasing with this paper two new manually-labeled datasets1, constructed by multiple web crawls and crowd-sourced annotation. 1Datasets at http://people.eng.unimelb.edu.au/brubinstein/data |
| Dataset Splits | No | The paper mentions “crossvalidation to model select regularization terms” but does not specify the explicit training, validation, or test dataset splits (e.g., percentages or sample counts) used for its experiments. |
| Hardware Specification | Yes | We measure runtime on a PC with a 2.3GHz Intel Core i7 processor & 8GB of memory. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the implementation of the described methods. |
| Experiment Setup | Yes | We perform binary search to set L1 penalties for a desired level of sparsity, and crossvalidation to model select regularization terms (Witten, Tibshirani, and Hastie 2009). An important task is to determine the number of principal components. A natural approach is via the scree plot: eigenvalues by rank. The retained components can be set by thresholding the eigenvalues which correspond to correlations under discovered components or by identifying a knee in the curve. |