Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fast and Provably Good Seedings for k-Means
Authors: Olivier Bachem, Mario Lucic, Hamed Hassani, Andreas Krause
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical results in extensive experiments on a variety of real-world data sets. |
| Researcher Affiliation | Academia | Olivier Bachem Department of Computer Science ETH Zurich EMAIL Mario Lucic Department of Computer Science ETH Zurich EMAIL S. Hamed Hassani Department of Computer Science ETH Zurich EMAIL Andreas Krause Department of Computer Science ETH Zurich EMAIL |
| Pseudocode | Yes | Algorithm 1 ASSUMPTION-FREE K-MC2(AFK-MC2) |
| Open Source Code | Yes | An implementation of ASSUMPTION-FREE K-MC2 has been released at http://olivierbachem.ch. |
| Open Datasets | Yes | Table 1: Data sets used in experimental evaluation CSN (EARTHQUAKES) KDD (PROTEIN HOMOLOGY) RNA (RNA SEQUENCES) SONG (MUSIC SONGS) SUSY (SUPERSYM. PARTICLES) WEB (WEB USERS) |
| Dataset Splits | No | The paper mentions 'retain 250,000 data points as the holdout set for the evaluation' for some datasets, which implies a test/validation set, but it does not provide explicit train/validation/test splits with percentages or full details for all datasets used. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | No | The paper describes comparing different 'chain lengths m' (e.g., m = {1, 2, 5, 10, 20, 50, 100, 150, 200}), which are parameters of the algorithm. However, it does not explicitly provide other common experimental setup details such as learning rates, batch sizes, optimizers, or other hyperparameters used during the training/refinement process (e.g., Lloyd's algorithm settings). |