Power k-Means Clustering
Authors: Jason Xu, Kenneth Lange
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | reaping marked improvements when used together as demonstrated on a suite of simulated and real data examples. As we will see in empirical studies, seeding methods can be immediately applied together with our algorithm, furthering the advantages it confers. |
| Researcher Affiliation | Academia | 1Department of Statistical Science, Duke University 2Departments of Biomathematics, Statistics, and Human Genetics, UCLA. Correspondence to: Jason Xu <jason.q.xu@duke.edu>. |
| Pseudocode | Yes | Algorithm 1 Power k-means algorithm pseudocode |
| Open Source Code | No | The paper states "All simulations were implemented in Julia (Bezanson et al., 2017)" but does not provide concrete access to the source code for the methodology described. |
| Open Datasets | Yes | The Supplement includes further comparisons on the BIRCH (n = 100 000, d = 2) and MNIST (n = 60 000, d = 784) benchmark datasets, as well as additional results in terms of adjusted Rand index (ARI) and under uniform random initializations. We now analyze protein expression data collected in a murine study of trisomy 21, more commonly known as Down syndrome (Ahmed et al., 2015; Higuera et al., 2015). |
| Dataset Splits | No | The paper describes using simulated data and benchmark datasets (BIRCH, MNIST, Protein data) but does not provide specific details on how these datasets were split into training, validation, or test sets for reproducibility. |
| Hardware Specification | No | The paper states, "All simulations were implemented in Julia (Bezanson et al., 2017) and conducted on a standard Macbook laptop." However, "standard Macbook laptop" lacks specific hardware details such as CPU model, GPU, or memory. |
| Software Dependencies | No | The paper mentions "All simulations were implemented in Julia (Bezanson et al., 2017)", but it does not provide specific version numbers for Julia or any other ancillary software dependencies required for replication. |
| Experiment Setup | Yes | All algorithms were run until relative change in objective fell below ϵ = 10 6/ and initial power s0 under matched initial centers, seeded using k-means++. As we show in Section 4, a default rule sm+1 = ηsm with η = 1.05 and s0 < 0 is successful across synthetic and real datasets from multiple domains of varying size n and dimension d. |