Power k-Means Clustering

Authors: Jason Xu, Kenneth Lange

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental reaping marked improvements when used together as demonstrated on a suite of simulated and real data examples. As we will see in empirical studies, seeding methods can be immediately applied together with our algorithm, furthering the advantages it confers.
Researcher Affiliation Academia 1Department of Statistical Science, Duke University 2Departments of Biomathematics, Statistics, and Human Genetics, UCLA. Correspondence to: Jason Xu <jason.q.xu@duke.edu>.
Pseudocode Yes Algorithm 1 Power k-means algorithm pseudocode
Open Source Code No The paper states "All simulations were implemented in Julia (Bezanson et al., 2017)" but does not provide concrete access to the source code for the methodology described.
Open Datasets Yes The Supplement includes further comparisons on the BIRCH (n = 100 000, d = 2) and MNIST (n = 60 000, d = 784) benchmark datasets, as well as additional results in terms of adjusted Rand index (ARI) and under uniform random initializations. We now analyze protein expression data collected in a murine study of trisomy 21, more commonly known as Down syndrome (Ahmed et al., 2015; Higuera et al., 2015).
Dataset Splits No The paper describes using simulated data and benchmark datasets (BIRCH, MNIST, Protein data) but does not provide specific details on how these datasets were split into training, validation, or test sets for reproducibility.
Hardware Specification No The paper states, "All simulations were implemented in Julia (Bezanson et al., 2017) and conducted on a standard Macbook laptop." However, "standard Macbook laptop" lacks specific hardware details such as CPU model, GPU, or memory.
Software Dependencies No The paper mentions "All simulations were implemented in Julia (Bezanson et al., 2017)", but it does not provide specific version numbers for Julia or any other ancillary software dependencies required for replication.
Experiment Setup Yes All algorithms were run until relative change in objective fell below ϵ = 10 6/ and initial power s0 under matched initial centers, seeded using k-means++. As we show in Section 4, a default rule sm+1 = ηsm with η = 1.05 and s0 < 0 is successful across synthetic and real datasets from multiple domains of varying size n and dimension d.