Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast k-means with accurate bounds

Authors: James Newling, Francois Fleuret

ICML 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare 23 k-means implementations, including our own implementations of all algorithms described, original implementations accompanying the papers (Hamerly, 2010; Drake, 2013; Ding et al., 2015), and implementations in two popular machine learning libraries, VLFeat and mlpack. We use the following notation to refer to implementations: {codesource-algorithm}, where codesource is one of bay (Hamerly, 2015), mlp (Curtin et al., 2013), pow (Low et al., 2010), vlf (Vedaldi & Fulkerson, 2008) and own (our own code), and algorithm is one of the algorithms described.
Researcher Affiliation Academia James Newling EMAIL Idiap Research Institute & EPFL, Switzerland François Fleuret EMAIL Idiap Research Institute & EPFL, Switzerland
Pseudocode No The paper describes algorithms conceptually and mathematically but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Fully parallelised implementations of all algorithms are provided under an open-source license at https://github.com/idiap/eakmeans
Open Datasets Yes Table 1. The 22 datasets used in experiments, ranging in dimension from 2 to 784. The datasets come from: the UCI, KDD and KEEL repositories (11,2,2), MNIST and STL-10 image databases (2,1), random (2), European Bioinformatics Institute (1) and Joensuu University (1). Full names and further details in D.
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or methodology for partitioning the data).
Hardware Specification Yes All experiments are performed using double precision floating point numbers. We compare 23 k-means implementations... on a machine with an Intel i7 processor and 8MB of cache memory.
Software Dependencies No The paper mentions 'C++11 thread support library' and external libraries like VLFeat, mlpack, and Open BLAS, but it does not provide specific version numbers for these or other key software components used in their own implementation.
Experiment Setup No The paper describes algorithmic details but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or detailed training configurations.