reproducibilityindex.ai

Flexible Models for Microclustering with Application to Entity Resolution

Authors: Brenda Betancourt, Giacomo Zanella, Jeffrey W. Miller, Hanna Wallach, Abbas Zaidi, Rebecca C. Steorts

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare models within this class to two commonly used clustering models using four entity-resolution data sets. In this section, we compare two entity resolution models based on the NBNB model and the NBD model to two similar models based on the DP mixture model [10] and the PYP mixture model [11].
Researcher Affiliation	Collaboration	Giacomo Zanella Department of Decision Sciences Bocconi University giacomo.zanella@unibocconi.it Brenda Betancourt Department of Statistical Science Duke University bb222@stat.duke.edu Hanna Wallach Microsoft Research hanna@dirichlet.net Jeffrey Miller Department of Biostatistics Harvard University jwmiller@hsph.harvard.edu Abbas Zaidi Department of Statistical Science Duke University amz19@stat.duke.edu Rebecca C. Steorts Departments of Statistical Science and Computer Science Duke University beka@stat.duke.edu
Pseudocode	No	The paper describes algorithms (e.g., 'reseating algorithm,' 'chaperones algorithm') but does not present them in a structured pseudocode or algorithm block.
Open Source Code	No	No statement or link regarding the release of open-source code for the methodology described in the paper was found.
Open Datasets	Yes	NLTCS5000: We derived this data set from the National Long Term Care Survey (NLTCS)5 a longitudinal survey of older Americans, conducted roughly every six years. ... 5http://www.nltcs.aas.duke.edu/ and Syria2000 and Syria Sizes: We constructed these data sets from data collected by four human-rights groups between 2011 and 2014 on people killed in the Syrian conﬂict [19, 20].
Dataset Splits	No	The paper refers to 'data sets' but does not explicitly specify training, validation, or test splits, nor does it provide details on how the data was partitioned for model development and evaluation.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper mentions methods like 'slice sampling [17]' but does not provide specific software dependencies (e.g., library or solver names with version numbers) used for the implementation or experiments.
Experiment Setup	Yes	For the NBNB model and the NBD model, we set a and q to reﬂect a weakly informative prior belief that E[K] = p 2 . For the NBNB model, we set ηr = sr = 1 and up = vp = 2.4 For the NBD model, we set α = 1 and set µ(0) to be a geometric distribution over N = {1, 2, . . .} with a parameter of 0.5.