reproducibilityindex.ai

Massively Parallel $k$-Means Clustering for Perturbation Resilient Instances

Authors: Vincent Cohen-Addad, Vahab Mirrokni, Peilin Zhong

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we present an empirical study of algorithms validating their effectiveness.5. Experiments In addition to the formal analysis of the theoretical guarantees of our k-means algorithms, we also conduct preliminary experiments on both synthetic datasets and real datasets in the ofﬂine setting to further investigate their practical performances.
Researcher Affiliation	Industry	Vincent Cohen-Addad * 1 Vahab Mirrokni * 1 Peilin Zhong * 1 1Google Research. Correspondence to: Vincent Cohen-Addad <cohenaddad@google.com>, Vahab Mirrokni <mirrokni@google.com>, Peilin Zhong <peilinz@google.com>.
Pseudocode	Yes	Algorithm 1 Near Neighbor Graph via LSH, Algorithm 2 Candidates of Optimal Clusters, Algorithm 3 Tree Construction over Candidate Clusters, Algorithm 4 Conversion to A Binary Tree, Algorithm 5 Dynamic Programming for Exact Solution, Algorithm 6 Dynamic Programming for Approximate Solution, Algorithm 7 Framework for Dynamic Programming in MPC, Algorithm 8 LSH for Points in Euclidean Spaces.
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	All datasets can be obtained by sklearn.datasets.fetch openml of the Python scikit-learn package.
Dataset Splits	No	The paper describes experiments on synthetic and real datasets but does not explicitly provide specific training/validation/test dataset splits.
Hardware Specification	Yes	We ran experiments on a machine with 16G RAM and Intel Core i7-3720QM@2.60GHz CPU. All experiments were in single threaded mode.
Software Dependencies	No	All codes are in C++. ... k-means solver using k-means++ seeding implemeted by Python scikit-learn package (Pedregosa et al., 2011). No specific version numbers for C++ compiler or scikit-learn are provided.
Experiment Setup	Yes	We use the same choice as (Datar et al., 2004), i.e., r = 4 and m = 10, and thus we choose t of Algorithm 1 to be 100. For Algorithm 2, instead of running Algorithm 1 for each scale 1.01i, we run for each scale 1.1i. For Algorithm 6, we choose ε = 0.1.