reproducibilityindex.ai

Improved Guarantees for k-means++ and k-means++ Parallel

Authors: Konstantin Makarychev, Aravind Reddy, Liren Shan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide novel analyses and show improved approximation and bi-criteria approximation guarantees for k-means++ and k-meansk. Our results give a better theoretical justiﬁcation for why these algorithms perform extremely well in practice.
Researcher Affiliation	Academia	Konstantin Makarychev Aravind Reddy Liren Shan Department of Computer Science Northwestern University Evanston, IL, USA
Pseudocode	Yes	Algorithm 1 k-means++ seeding Algorithm 2 k-meansk seeding Algorithm 3 k-meansk Pois seeding
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper. It mentions that k-means++ is used by various libraries but does not state that the authors are releasing their own code.
Open Datasets	Yes	Bio Test from KDD-Cup 2004 (Elber, 2004) and COVTYPE from the UCI ML repository (Dua & Graff, 2017)
Dataset Splits	No	The paper mentions running algorithms for 50 iterations on datasets but does not provide specific dataset split information (percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning into train, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	No	The paper mentions running algorithms for "50 iterations" but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size) or other training configurations.