Flexible Models for Microclustering with Application to Entity Resolution

Authors: Brenda Betancourt, Giacomo Zanella, Jeffrey W. Miller, Hanna Wallach, Abbas Zaidi, Rebecca C. Steorts

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare models within this class to two commonly used clustering models using four entity-resolution data sets. In this section, we compare two entity resolution models based on the NBNB model and the NBD model to two similar models based on the DP mixture model [10] and the PYP mixture model [11].
Researcher Affiliation Collaboration Giacomo Zanella Department of Decision Sciences Bocconi University giacomo.zanella@unibocconi.it Brenda Betancourt Department of Statistical Science Duke University bb222@stat.duke.edu Hanna Wallach Microsoft Research hanna@dirichlet.net Jeffrey Miller Department of Biostatistics Harvard University jwmiller@hsph.harvard.edu Abbas Zaidi Department of Statistical Science Duke University amz19@stat.duke.edu Rebecca C. Steorts Departments of Statistical Science and Computer Science Duke University beka@stat.duke.edu
Pseudocode No The paper describes algorithms (e.g., 'reseating algorithm,' 'chaperones algorithm') but does not present them in a structured pseudocode or algorithm block.
Open Source Code No No statement or link regarding the release of open-source code for the methodology described in the paper was found.
Open Datasets Yes NLTCS5000: We derived this data set from the National Long Term Care Survey (NLTCS)5 a longitudinal survey of older Americans, conducted roughly every six years. ... 5http://www.nltcs.aas.duke.edu/ and Syria2000 and Syria Sizes: We constructed these data sets from data collected by four human-rights groups between 2011 and 2014 on people killed in the Syrian conflict [19, 20].
Dataset Splits No The paper refers to 'data sets' but does not explicitly specify training, validation, or test splits, nor does it provide details on how the data was partitioned for model development and evaluation.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper.
Software Dependencies No The paper mentions methods like 'slice sampling [17]' but does not provide specific software dependencies (e.g., library or solver names with version numbers) used for the implementation or experiments.
Experiment Setup Yes For the NBNB model and the NBD model, we set a and q to reflect a weakly informative prior belief that E[K] = p 2 . For the NBNB model, we set ηr = sr = 1 and up = vp = 2.4 For the NBD model, we set α = 1 and set µ(0) to be a geometric distribution over N = {1, 2, . . .} with a parameter of 0.5.