Improved Guarantees for k-means++ and k-means++ Parallel

Authors: Konstantin Makarychev, Aravind Reddy, Liren Shan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide novel analyses and show improved approximation and bi-criteria approximation guarantees for k-means++ and k-meansk. Our results give a better theoretical justification for why these algorithms perform extremely well in practice.
Researcher Affiliation Academia Konstantin Makarychev Aravind Reddy Liren Shan Department of Computer Science Northwestern University Evanston, IL, USA
Pseudocode Yes Algorithm 1 k-means++ seeding Algorithm 2 k-meansk seeding Algorithm 3 k-meansk Pois seeding
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper. It mentions that k-means++ is used by various libraries but does not state that the authors are releasing their own code.
Open Datasets Yes Bio Test from KDD-Cup 2004 (Elber, 2004) and COVTYPE from the UCI ML repository (Dua & Graff, 2017)
Dataset Splits No The paper mentions running algorithms for 50 iterations on datasets but does not provide specific dataset split information (percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning into train, validation, or test sets.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No The paper mentions running algorithms for "50 iterations" but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size) or other training configurations.