Why the Rich Get Richer? On the Balancedness of Random Partition Models
Authors: Changwoo J Lee, Huiyan Sang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the effectiveness of balance-seeking random partition for the ER task using the Survey of Income and Program Participation data (U.S. Census Bureau, 2009). |
| Researcher Affiliation | Academia | 1Department of Statistics, Texas A&M University, Texas, USA. |
| Pseudocode | No | The paper describes algorithms and inference steps in text and mathematical formulas (e.g., in Appendix C), but it does not present them in a structured pseudocode or algorithm block format. |
| Open Source Code | Yes | The software for the described posterior inference algorithms will be available in R package microclustr (Steorts et al., 2020). |
| Open Datasets | Yes | We use the same dataset (SIPP1000) that Betancourt et al. (2020) used to benchmark the performance; the database with n = 4116 (number of records) and K+ = 1000 (number of entities) was collected from the five waves of the longitudinal survey performed between 2005-2006. (U.S. Census Bureau, 2009). |
| Dataset Splits | No | The paper conducts experiments and simulation studies, but does not specify training, validation, or testing splits for the datasets used. Evaluation metrics like FNR and FDR are reported on the dataset directly, implying a full-data evaluation rather than a split methodology. |
| Hardware Specification | Yes | All computations were performed on an Intel E5-2690 v3 CPU with 128GB of memory. |
| Software Dependencies | No | The paper mentions that software will be available in 'R package microclustr', but it does not specify version numbers for R, the package, or any other software dependencies, which are necessary for reproducibility. |
| Experiment Setup | Yes | Hyperpriors and MCMC specification details are described in Appendix D. We collect 15000 posterior samples after 5000 burn-in iterations, where we update cluster indicators (zi) for each individual within each one (global) MCMC iteration. |