Bregman Power k-Means for Clustering Exponential Family Data
Authors: Adithya Vellal, Saptarshi Chakraborty, Jason Q Xu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, we consider thorough empirical analyses on simulated experiments and a case study on rainfall data, finding that the proposed method outperforms existing peer methods in a variety of non-Gaussian data settings. |
| Researcher Affiliation | Academia | 1Department of Statistical Science, Duke University, Durham, NC, USA. 2Department of Statistics, University of California, Berkeley, CA, USA. |
| Pseudocode | Yes | Algorithm 1 Bregman Power k-means Pseudocode |
| Open Source Code | Yes | An open-source Python implementation of the proposed method, including reproducible code for all data generating mechanisms and experiments in this paper, is available and maintained in a repository by the first author1. 1Publicly available at https://github.com/ avellal14/bregman_power_kmeans |
| Open Datasets | Yes | Here we consider data from the Italian region of San Martino di Castrozza, collected across the years 1970 1990 during the months of January (177 points) and June (397 points)2. Only days with non-zero rainfall amounts are included, and their values are often modeled by domain experts according to a Gamma distribution (Coe & Stern, 1982). 2Publicly available at https://cran.r-project.org/ web/packages/hydro TSM/vignettes/hydro TSM_ Vignette-knitr.pdf |
| Dataset Splits | No | The paper does not explicitly specify training, validation, or test dataset splits. It discusses evaluation on simulated data and a real-world dataset, comparing clustering solutions against ground truth or other methods without partitioning the data into distinct sets for model training, validation, and testing. |
| Hardware Specification | No | The paper describes the experimental setup and results but does not specify any hardware details like CPU, GPU models, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions an "open-source Python implementation" but does not provide specific version numbers for Python or any other software dependencies, libraries, or packages used in the experiments. |
| Experiment Setup | Yes | Centers are randomly initialized according to a uniform distribution spanning the range of all the data points, and each peer method starts from matched initializations to ensure a fair comparison. An s0 value of 0.2 was used for power k-means and our method. (...) For Power k-means and Bregman Power k-means, an s0 value of 3.0 is used. (...) Mean ARIs and standard deviations across 250 random trials are detailed in Table 3, which also considers various initial powers s0 for power k-means and our proposed method. |