Speeding up k-means by approximating Euclidean distances via block vectors

Authors: Thomas Bottesch, Thomas Bühler, Markus Kächele

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive experiments, we show a considerable reduction in terms of computational time in comparison to standard methods and the recently proposed Yinyang k-means.
Researcher Affiliation Collaboration Thomas Bottesch1,2 THOMAS.BOTTESCH@AVIRA.COM Thomas B uhler1 THOMAS.BUEHLER@AVIRA.COM Markus K achele2 MARKUS.KAECHELE@UNI-ULM.DE 1Avira Operations Gmb H & Co. KG, Tettnang, Germany 2Institute of Neural Information Processing, Ulm University, Ulm, Germany
Pseudocode Yes Algorithm 1 Lloyd s algorithm for k-means; Algorithm 2 Yinyang k-means; Algorithm 3 Local filtering in Yinyang k-means; Algorithm 4 kmeans optimized; Algorithm 5 Local filtering in fast yinyang
Open Source Code No The paper does not provide any explicit statements about releasing source code or a link to a code repository.
Open Datasets Yes A series of experiments were carried out using 9 datasets, most of which are taken from the libsvm homepage1. Additionally, large scale image and malware datasets were analysed. ... 1https://www.csie.ntu.edu.tw/ cjlin/ libsvmtools/datasets/
Dataset Splits No The paper mentions using 9 datasets and that they were scaled, but does not provide specific details on training, validation, or test splits (e.g., percentages, sample counts, or a description of the splitting methodology).
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup Yes In the experiments, the same initial cluster centers are chosen for all methods. ... Based on the results from this experiment, in the following the size of the block vectors is chosen in such a way that annz(XB) 0.3 annz(X). This is achieved by starting off with a static initial block size, and iteratively reducing the block size until annz(XB) < 0.3 annz(X) holds. In our experiments, this iterative procedure typically needs around 3-4 steps until the condition is met.