Massively Parallel $k$-Means Clustering for Perturbation Resilient Instances
Authors: Vincent Cohen-Addad, Vahab Mirrokni, Peilin Zhong
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present an empirical study of algorithms validating their effectiveness.5. Experiments In addition to the formal analysis of the theoretical guarantees of our k-means algorithms, we also conduct preliminary experiments on both synthetic datasets and real datasets in the offline setting to further investigate their practical performances. |
| Researcher Affiliation | Industry | Vincent Cohen-Addad * 1 Vahab Mirrokni * 1 Peilin Zhong * 1 1Google Research. Correspondence to: Vincent Cohen-Addad <cohenaddad@google.com>, Vahab Mirrokni <mirrokni@google.com>, Peilin Zhong <peilinz@google.com>. |
| Pseudocode | Yes | Algorithm 1 Near Neighbor Graph via LSH, Algorithm 2 Candidates of Optimal Clusters, Algorithm 3 Tree Construction over Candidate Clusters, Algorithm 4 Conversion to A Binary Tree, Algorithm 5 Dynamic Programming for Exact Solution, Algorithm 6 Dynamic Programming for Approximate Solution, Algorithm 7 Framework for Dynamic Programming in MPC, Algorithm 8 LSH for Points in Euclidean Spaces. |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | All datasets can be obtained by sklearn.datasets.fetch openml of the Python scikit-learn package. |
| Dataset Splits | No | The paper describes experiments on synthetic and real datasets but does not explicitly provide specific training/validation/test dataset splits. |
| Hardware Specification | Yes | We ran experiments on a machine with 16G RAM and Intel Core i7-3720QM@2.60GHz CPU. All experiments were in single threaded mode. |
| Software Dependencies | No | All codes are in C++. ... k-means solver using k-means++ seeding implemeted by Python scikit-learn package (Pedregosa et al., 2011). No specific version numbers for C++ compiler or scikit-learn are provided. |
| Experiment Setup | Yes | We use the same choice as (Datar et al., 2004), i.e., r = 4 and m = 10, and thus we choose t of Algorithm 1 to be 100. For Algorithm 2, instead of running Algorithm 1 for each scale 1.01i, we run for each scale 1.1i. For Algorithm 6, we choose ε = 0.1. |