Learning-Augmented $k$-means Clustering
Authors: Jon C. Ergun, Zhili Feng, Sandeep Silwal, David Woodruff, Samson Zhou
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the empirical side, we evaluate our algorithms on real and synthetic datasets. We experimentally show that good predictors can be learned for all of our varied datasets, which can aid in clustering. We also show our methodology is more robust than other heuristics such as random sampling. |
| Researcher Affiliation | Academia | Jon C. Ergun Georgetown Day School jergun22@gds.org Zhili Feng Carnegie Mellon University zhilif@andrew.cmu.edu Sandeep Silwal MIT silwal@mit.edu David P. Woodruff Carnegie Mellon University dwoodruf@andrew.cmu.edu Samson Zhou Carnegie Mellon University samsonzhou@gmail.com |
| Pseudocode | Yes | Algorithm 1 Learning-augmented k-means clustering (Page 4) Algorithm 2 Coordinate-wise estimation CRDEST (Page 4) Algorithm 3 Fast learning-augmented algorithm for k-means clustering. (Page 6) Algorithm 4 Linear time k-means algorithm with access to a label predictor Π with deletion rate λ. (Page 10) Algorithm 5 Learning-Augmented k-median Clustering (Page 11) |
| Open Source Code | No | The paper does not provide any explicit statements or links to its own source code. The only code-related link is to a third-party PyTorch CIFAR-10 model: "Pytorchcifar10. https://github.com/huyvnphan/PyTorch_CIFAR10, 2020." |
| Open Datasets | Yes | 1) Oregon: Dataset of 9 graph snapshots sampled across 3 months from an internet router communication network (Leskovec et al., 2005). 2) PHY: Dataset from KDD cup 2004 (kdd, 2004). 3) CIFAR10: Testing portion of CIFAR-10 (n = 104, dimension 3072) (Krizhevsky, 2009). |
| Dataset Splits | No | The paper mentions "train/test split" and refers to using "training portion" of datasets (e.g., CIFAR-10), but it does not specify any details regarding a separate validation split or dataset for its experiments. |
| Hardware Specification | Yes | All of our experiments were done on a CPU with i5 2.7 GHz dual core and 8 GB RAM. |
| Software Dependencies | No | The paper mentions using the "Py Torch library" for training a neural network in Section E.1, but it does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We primarily fix the number of clusters to be k = 10 and k = 25 throughout our experiments for all datasets. [...] To do so, we iterate over a small range of possible α from .01 to .15 in Algorithm 2 with a step size of 0.01 and select the clustering that results in the lowest objective cost. |