Flexible Models for Microclustering with Application to Entity Resolution
Authors: Brenda Betancourt, Giacomo Zanella, Jeffrey W. Miller, Hanna Wallach, Abbas Zaidi, Rebecca C. Steorts
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare models within this class to two commonly used clustering models using four entity-resolution data sets. In this section, we compare two entity resolution models based on the NBNB model and the NBD model to two similar models based on the DP mixture model [10] and the PYP mixture model [11]. |
| Researcher Affiliation | Collaboration | Giacomo Zanella Department of Decision Sciences Bocconi University giacomo.zanella@unibocconi.it Brenda Betancourt Department of Statistical Science Duke University bb222@stat.duke.edu Hanna Wallach Microsoft Research hanna@dirichlet.net Jeffrey Miller Department of Biostatistics Harvard University jwmiller@hsph.harvard.edu Abbas Zaidi Department of Statistical Science Duke University amz19@stat.duke.edu Rebecca C. Steorts Departments of Statistical Science and Computer Science Duke University beka@stat.duke.edu |
| Pseudocode | No | The paper describes algorithms (e.g., 'reseating algorithm,' 'chaperones algorithm') but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | No statement or link regarding the release of open-source code for the methodology described in the paper was found. |
| Open Datasets | Yes | NLTCS5000: We derived this data set from the National Long Term Care Survey (NLTCS)5 a longitudinal survey of older Americans, conducted roughly every six years. ... 5http://www.nltcs.aas.duke.edu/ and Syria2000 and Syria Sizes: We constructed these data sets from data collected by four human-rights groups between 2011 and 2014 on people killed in the Syrian conflict [19, 20]. |
| Dataset Splits | No | The paper refers to 'data sets' but does not explicitly specify training, validation, or test splits, nor does it provide details on how the data was partitioned for model development and evaluation. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions methods like 'slice sampling [17]' but does not provide specific software dependencies (e.g., library or solver names with version numbers) used for the implementation or experiments. |
| Experiment Setup | Yes | For the NBNB model and the NBD model, we set a and q to reflect a weakly informative prior belief that E[K] = p 2 . For the NBNB model, we set ηr = sr = 1 and up = vp = 2.4 For the NBD model, we set α = 1 and set µ(0) to be a geometric distribution over N = {1, 2, . . .} with a parameter of 0.5. |