Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Speeding up k-means by approximating Euclidean distances via block vectors
Authors: Thomas Bottesch, Thomas Bühler, Markus Kächele
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive experiments, we show a considerable reduction in terms of computational time in comparison to standard methods and the recently proposed Yinyang k-means. |
| Researcher Affiliation | Collaboration | Thomas Bottesch1,2 EMAIL Thomas B uhler1 EMAIL Markus K achele2 EMAIL 1Avira Operations Gmb H & Co. KG, Tettnang, Germany 2Institute of Neural Information Processing, Ulm University, Ulm, Germany |
| Pseudocode | Yes | Algorithm 1 Lloyd s algorithm for k-means; Algorithm 2 Yinyang k-means; Algorithm 3 Local filtering in Yinyang k-means; Algorithm 4 kmeans optimized; Algorithm 5 Local filtering in fast yinyang |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or a link to a code repository. |
| Open Datasets | Yes | A series of experiments were carried out using 9 datasets, most of which are taken from the libsvm homepage1. Additionally, large scale image and malware datasets were analysed. ... 1https://www.csie.ntu.edu.tw/ cjlin/ libsvmtools/datasets/ |
| Dataset Splits | No | The paper mentions using 9 datasets and that they were scaled, but does not provide specific details on training, validation, or test splits (e.g., percentages, sample counts, or a description of the splitting methodology). |
| Hardware Specification | No | The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | In the experiments, the same initial cluster centers are chosen for all methods. ... Based on the results from this experiment, in the following the size of the block vectors is chosen in such a way that annz(XB) 0.3 annz(X). This is achieved by starting off with a static initial block size, and iteratively reducing the block size until annz(XB) < 0.3 annz(X) holds. In our experiments, this iterative procedure typically needs around 3-4 steps until the condition is met. |