reproducibilityindex.ai

Power k-Means Clustering

Authors: Jason Xu, Kenneth Lange

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	reaping marked improvements when used together as demonstrated on a suite of simulated and real data examples. As we will see in empirical studies, seeding methods can be immediately applied together with our algorithm, furthering the advantages it confers.
Researcher Affiliation	Academia	1Department of Statistical Science, Duke University 2Departments of Biomathematics, Statistics, and Human Genetics, UCLA. Correspondence to: Jason Xu <jason.q.xu@duke.edu>.
Pseudocode	Yes	Algorithm 1 Power k-means algorithm pseudocode
Open Source Code	No	The paper states "All simulations were implemented in Julia (Bezanson et al., 2017)" but does not provide concrete access to the source code for the methodology described.
Open Datasets	Yes	The Supplement includes further comparisons on the BIRCH (n = 100 000, d = 2) and MNIST (n = 60 000, d = 784) benchmark datasets, as well as additional results in terms of adjusted Rand index (ARI) and under uniform random initializations. We now analyze protein expression data collected in a murine study of trisomy 21, more commonly known as Down syndrome (Ahmed et al., 2015; Higuera et al., 2015).
Dataset Splits	No	The paper describes using simulated data and benchmark datasets (BIRCH, MNIST, Protein data) but does not provide specific details on how these datasets were split into training, validation, or test sets for reproducibility.
Hardware Specification	No	The paper states, "All simulations were implemented in Julia (Bezanson et al., 2017) and conducted on a standard Macbook laptop." However, "standard Macbook laptop" lacks specific hardware details such as CPU model, GPU, or memory.
Software Dependencies	No	The paper mentions "All simulations were implemented in Julia (Bezanson et al., 2017)", but it does not provide specific version numbers for Julia or any other ancillary software dependencies required for replication.
Experiment Setup	Yes	All algorithms were run until relative change in objective fell below ϵ = 10 6/ and initial power s0 under matched initial centers, seeded using k-means++. As we show in Section 4, a default rule sm+1 = ηsm with η = 1.05 and s0 < 0 is successful across synthetic and real datasets from multiple domains of varying size n and dimension d.