reproducibilityindex.ai

A Theoretical Analysis of First Heuristics of Crowdsourced Entity Resolution

Authors: Arya Mazumdar, Barna Saha

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we make the ﬁrst attempt to close this gap. We provide a thorough analysis of the prominent heuristic algorithms for crowd-based ER. We justify experimental observations with our analysis and information theoretic lower bounds. Moreover, we conduct a thorough experiment on the bibliographical cora (Mc Callum 2004) dataset for ER and several synthetic datasets to validate the theoretical ﬁndings further.
Researcher Affiliation	Academia	Arya Mazumdar and Barna Saha College of Information & Computer Sciences University of Massachusetts Amherst {arya,barna}@cs.umass.edu
Pseudocode	No	The paper describes algorithms in prose but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We used the widely used cora (Mc Callum 2004) dataset for ER.
Dataset Splits	No	The paper mentions using "cora" and "synthetic datasets" but does not provide specific details on training, validation, or test data splits.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	We created multiple synthetic datasets each containing 1200 nodes and 14 clusters with the following size distribution: two clusters of size 200, four clusters of size 100, eight clusters of size 50, two clusters each of size 30 and 20 and the rest of the clusters of size 10. The datasets differed in the way similarity values are generated by varying ϵ and sampling the values either from Dist-1 or Dist-2. The similarity values are further discretized to take values from the set {0, 0.1, 0.2, ..., 0.9, 1}. We used the similarity function as in (Whang, Lofgren, and Garcia-Molina 2013; Wang et al. 2013; Vesdapunt, Bellare, and Dalvi 2014; Firmani, Saha, and Srivastava 2016).