Multi-Swap k-Means++

Authors: Lorenzo Beretta, Vincent Cohen-Addad, Silvio Lattanzi, Nikos Parotsidis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experiments where we compare against k-means++ and Lattanzi and Sohler [2019]. We study a variant of our algorithm that performs very competitively with our theoretically sound algorithm. The variant is very efficient and still outperforms previous work in terms of solution quality, even after the standard postprocessing using Lloyd.
Researcher Affiliation Collaboration Lorenzo Beretta University of Copenhagen lorenzo2beretta@gmail.com Vincent Cohen-Addad Google Research cohenaddad@google.com Silvio Lattanzi Google Research silviol@google.com Nikos Parotsidis Google Research nikosp@google.com
Pseudocode No The paper describes the algorithm steps in paragraph form within Section 3 "The Algorithm" but does not provide structured pseudocode or an algorithm block.
Open Source Code No All our code is written in Python. The code will be made available upon publication of this work.
Open Datasets Yes Datasets. We consider the three datasets used in Lattanzi and Sohler [2019] to evaluate the performance of SSLS: 1) KDD-PHY 100, 000 points with 78 features representing a quantum physic task kdd [2004], 2) RNA 488, 565 points with 8 features representing RNA input sequence pairs Uzilov et al. [2006], and 3) KDD-BIO 145, 751 points with 74 features measuring the match between a protein and a native sequence kdd [2004].
Dataset Splits No The paper does not explicitly provide details about training, validation, or test dataset splits.
Hardware Specification Yes To run our experiments, we used a personal computer with 8 cores, a 1.8 Ghz processor, and 15.9 Gi B of main memory
Software Dependencies No All our code is written in Python. This only specifies the programming language, not specific software dependencies with version numbers.
Experiment Setup Yes We run all experiments 5 times and report the mean and standard deviation in our plots. We performed a preprocessing step to clean-up the datasets. ... we apply min-max scaling to all datasets. We initialize k = 25 centers using k-means++ and then run 50 iterations of local search for both algorithms, for p = 3 swaps.