Improved Guarantees for k-means++ and k-means++ Parallel
Authors: Konstantin Makarychev, Aravind Reddy, Liren Shan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide novel analyses and show improved approximation and bi-criteria approximation guarantees for k-means++ and k-meansk. Our results give a better theoretical justiļ¬cation for why these algorithms perform extremely well in practice. |
| Researcher Affiliation | Academia | Konstantin Makarychev Aravind Reddy Liren Shan Department of Computer Science Northwestern University Evanston, IL, USA |
| Pseudocode | Yes | Algorithm 1 k-means++ seeding Algorithm 2 k-meansk seeding Algorithm 3 k-meansk Pois seeding |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. It mentions that k-means++ is used by various libraries but does not state that the authors are releasing their own code. |
| Open Datasets | Yes | Bio Test from KDD-Cup 2004 (Elber, 2004) and COVTYPE from the UCI ML repository (Dua & Graff, 2017) |
| Dataset Splits | No | The paper mentions running algorithms for 50 iterations on datasets but does not provide specific dataset split information (percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning into train, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions running algorithms for "50 iterations" but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size) or other training configurations. |