reproducibilityindex.ai

Speeding up k-means by approximating Euclidean distances via block vectors

Authors: Thomas Bottesch, Thomas Bühler, Markus Kächele

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In extensive experiments, we show a considerable reduction in terms of computational time in comparison to standard methods and the recently proposed Yinyang k-means.
Researcher Affiliation	Collaboration	Thomas Bottesch1,2 THOMAS.BOTTESCH@AVIRA.COM Thomas B uhler1 THOMAS.BUEHLER@AVIRA.COM Markus K achele2 MARKUS.KAECHELE@UNI-ULM.DE 1Avira Operations Gmb H & Co. KG, Tettnang, Germany 2Institute of Neural Information Processing, Ulm University, Ulm, Germany
Pseudocode	Yes	Algorithm 1 Lloyd s algorithm for k-means; Algorithm 2 Yinyang k-means; Algorithm 3 Local ﬁltering in Yinyang k-means; Algorithm 4 kmeans optimized; Algorithm 5 Local ﬁltering in fast yinyang
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or a link to a code repository.
Open Datasets	Yes	A series of experiments were carried out using 9 datasets, most of which are taken from the libsvm homepage1. Additionally, large scale image and malware datasets were analysed. ... 1https://www.csie.ntu.edu.tw/ cjlin/ libsvmtools/datasets/
Dataset Splits	No	The paper mentions using 9 datasets and that they were scaled, but does not provide specific details on training, validation, or test splits (e.g., percentages, sample counts, or a description of the splitting methodology).
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup	Yes	In the experiments, the same initial cluster centers are chosen for all methods. ... Based on the results from this experiment, in the following the size of the block vectors is chosen in such a way that annz(XB) 0.3 annz(X). This is achieved by starting off with a static initial block size, and iteratively reducing the block size until annz(XB) < 0.3 annz(X) holds. In our experiments, this iterative procedure typically needs around 3-4 steps until the condition is met.