Fast and Provably Good Seedings for k-Means

Authors: Olivier Bachem, Mario Lucic, Hamed Hassani, Andreas Krause

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our theoretical results in extensive experiments on a variety of real-world data sets.
Researcher Affiliation Academia Olivier Bachem Department of Computer Science ETH Zurich olivier.bachem@inf.ethz.ch Mario Lucic Department of Computer Science ETH Zurich lucic@inf.ethz.ch S. Hamed Hassani Department of Computer Science ETH Zurich hamed@inf.ethz.ch Andreas Krause Department of Computer Science ETH Zurich krausea@ethz.ch
Pseudocode Yes Algorithm 1 ASSUMPTION-FREE K-MC2(AFK-MC2)
Open Source Code Yes An implementation of ASSUMPTION-FREE K-MC2 has been released at http://olivierbachem.ch.
Open Datasets Yes Table 1: Data sets used in experimental evaluation CSN (EARTHQUAKES) KDD (PROTEIN HOMOLOGY) RNA (RNA SEQUENCES) SONG (MUSIC SONGS) SUSY (SUPERSYM. PARTICLES) WEB (WEB USERS)
Dataset Splits No The paper mentions 'retain 250,000 data points as the holdout set for the evaluation' for some datasets, which implies a test/validation set, but it does not provide explicit train/validation/test splits with percentages or full details for all datasets used.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup No The paper describes comparing different 'chain lengths m' (e.g., m = {1, 2, 5, 10, 20, 50, 100, 150, 200}), which are parameters of the algorithm. However, it does not explicitly provide other common experimental setup details such as learning rates, batch sizes, optimizers, or other hyperparameters used during the training/refinement process (e.g., Lloyd's algorithm settings).