Clustering Small Samples With Quality Guarantees: Adaptivity With One2all PPS
Authors: Edith Cohen, Shiri Chechik, Haim Kaplan
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We performed illustrative experiments for Euclidean k-means clustering on both synthetic and real-world data sets. We implemented our wrapper Algorithm 1 in numpy with the following base clustering algorithm A: We use 5 applications of KMEANS++ and take the set of k centroids that has the smallest clustering cost. This set is used as an initialization to 20 iterations of Lloyd s algorithm. The use of KMEANS++ to initialize Lloyd s algorithm is a prevalent method in practice. [...] Table 1 reports the results of our experiments. |
| Researcher Affiliation | Collaboration | Edith Cohen Google Research, USA Tel Aviv University, Israel Shiri Chechik Tel Aviv University, Israel Haim Kaplan Tel Aviv University, Israel |
| Pseudocode | Yes | Algorithm 1 Clustering Wrapper |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | MNIST and Fashion MNIST datasets: We use the MNIST data set of images of handwritten digits (Le Cun and Cortes 2010) and the Fashion data set of images of clothing items (Xiao, Rasul, and Vollgraf 2017). |
| Dataset Splits | No | The paper mentions using a "validation sample" in the Clustering Wrapper algorithm description, but it does not specify explicit training/validation/test dataset splits for the experiments conducted. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions implementation in "numpy" but does not specify version numbers for numpy or any other software dependencies. |
| Experiment Setup | Yes | We use 5 applications of KMEANS++ and take the set of k centroids that has the smallest clustering cost. This set is used as an initialization to 20 iterations of Lloyd s algorithm. |