reproducibilityindex.ai

Sub-Merge: Diving Down to the Attribute-Value Level in Statistical Schema Matching

Authors: Zhe Lim, Benjamin Rubinstein1791

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the superior statistical and computational performance of multiple sparse CCA compared to a suite of baseline algorithms, on two datasets which we are releasing to stimulate further research.
Researcher Affiliation	Collaboration	Zhe Lim and Benjamin I. P. Rubinstein Department of Computing and Information Systems The University of Melbourne, Australia zhe@inﬁnitelooplabs.com brubinstein@unimelb.edu.au
Pseudocode	Yes	Algorithm 1 One Vs All CCA post-processing
Open Source Code	No	The paper states: “To foster further research on this problem, we are releasing with this paper two new manually-labeled datasets1...”. Footnote 1 provides a URL for the datasets, but not for the source code of the methodology described in the paper.
Open Datasets	Yes	To foster further research on this problem, we are releasing with this paper two new manually-labeled datasets1, constructed by multiple web crawls and crowd-sourced annotation. 1Datasets at http://people.eng.unimelb.edu.au/brubinstein/data
Dataset Splits	No	The paper mentions “crossvalidation to model select regularization terms” but does not specify the explicit training, validation, or test dataset splits (e.g., percentages or sample counts) used for its experiments.
Hardware Specification	Yes	We measure runtime on a PC with a 2.3GHz Intel Core i7 processor & 8GB of memory.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the implementation of the described methods.
Experiment Setup	Yes	We perform binary search to set L1 penalties for a desired level of sparsity, and crossvalidation to model select regularization terms (Witten, Tibshirani, and Hastie 2009). An important task is to determine the number of principal components. A natural approach is via the scree plot: eigenvalues by rank. The retained components can be set by thresholding the eigenvalues which correspond to correlations under discovered components or by identifying a knee in the curve.