Bregman Power k-Means for Clustering Exponential Family Data

Authors: Adithya Vellal, Saptarshi Chakraborty, Jason Q Xu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Additionally, we consider thorough empirical analyses on simulated experiments and a case study on rainfall data, finding that the proposed method outperforms existing peer methods in a variety of non-Gaussian data settings.
Researcher Affiliation Academia 1Department of Statistical Science, Duke University, Durham, NC, USA. 2Department of Statistics, University of California, Berkeley, CA, USA.
Pseudocode Yes Algorithm 1 Bregman Power k-means Pseudocode
Open Source Code Yes An open-source Python implementation of the proposed method, including reproducible code for all data generating mechanisms and experiments in this paper, is available and maintained in a repository by the first author1. 1Publicly available at https://github.com/ avellal14/bregman_power_kmeans
Open Datasets Yes Here we consider data from the Italian region of San Martino di Castrozza, collected across the years 1970 1990 during the months of January (177 points) and June (397 points)2. Only days with non-zero rainfall amounts are included, and their values are often modeled by domain experts according to a Gamma distribution (Coe & Stern, 1982). 2Publicly available at https://cran.r-project.org/ web/packages/hydro TSM/vignettes/hydro TSM_ Vignette-knitr.pdf
Dataset Splits No The paper does not explicitly specify training, validation, or test dataset splits. It discusses evaluation on simulated data and a real-world dataset, comparing clustering solutions against ground truth or other methods without partitioning the data into distinct sets for model training, validation, and testing.
Hardware Specification No The paper describes the experimental setup and results but does not specify any hardware details like CPU, GPU models, or memory used for running the experiments.
Software Dependencies No The paper mentions an "open-source Python implementation" but does not provide specific version numbers for Python or any other software dependencies, libraries, or packages used in the experiments.
Experiment Setup Yes Centers are randomly initialized according to a uniform distribution spanning the range of all the data points, and each peer method starts from matched initializations to ensure a fair comparison. An s0 value of 0.2 was used for power k-means and our method. (...) For Power k-means and Bregman Power k-means, an s0 value of 3.0 is used. (...) Mean ARIs and standard deviations across 250 random trials are detailed in Table 3, which also considers various initial powers s0 for power k-means and our proposed method.