A Better k-means++ Algorithm via Local Search
Authors: Silvio Lattanzi, Christian Sohler
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm empirically and show that it also improves the quality of the solution in practice. |
| Researcher Affiliation | Industry | 1Google Research, Zurich, ZH, Switzerland. Correspondence to: Silvio Lattanzi <silviol@google.com>, Christian Sohler <sohler@google.com>. |
| Pseudocode | Yes | Algorithm 1 k-means++ seeding with local search Algorithm 2 Local Search++ |
| Open Source Code | No | The paper does not provide any explicit statements about making the source code available or include links to code repositories. |
| Open Datasets | Yes | RNA 8 features from 488565 RNA input sequence pairs (Uzilov et al., 2006) KDD-BIO 145751 samples with 74 features measuring the match between a protein and a native sequence (KDD) KDD-PHY 100000 samples with 78 features representing a quantum physic task (KDD) |
| Dataset Splits | No | The paper discusses applying k-means clustering to datasets but does not specify any training, validation, or test dataset splits. K-means is often applied to the entire dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We consider the k-means clustering for k = 25 or 50 on 3 different datasets: We stop the Lloyd s algorithm when the incremental improvement of an iteration was small. In particular, after 10 steps of Lloyd the improvement that we observed was less 0.4% per iteration for all considered datasets and for all different choices on the number of centers. |