reproducibilityindex.ai

Learning-Augmented $k$-means Clustering

Authors: Jon C. Ergun, Zhili Feng, Sandeep Silwal, David Woodruff, Samson Zhou

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the empirical side, we evaluate our algorithms on real and synthetic datasets. We experimentally show that good predictors can be learned for all of our varied datasets, which can aid in clustering. We also show our methodology is more robust than other heuristics such as random sampling.
Researcher Affiliation	Academia	Jon C. Ergun Georgetown Day School jergun22@gds.org Zhili Feng Carnegie Mellon University zhilif@andrew.cmu.edu Sandeep Silwal MIT silwal@mit.edu David P. Woodruff Carnegie Mellon University dwoodruf@andrew.cmu.edu Samson Zhou Carnegie Mellon University samsonzhou@gmail.com
Pseudocode	Yes	Algorithm 1 Learning-augmented k-means clustering (Page 4) Algorithm 2 Coordinate-wise estimation CRDEST (Page 4) Algorithm 3 Fast learning-augmented algorithm for k-means clustering. (Page 6) Algorithm 4 Linear time k-means algorithm with access to a label predictor Π with deletion rate λ. (Page 10) Algorithm 5 Learning-Augmented k-median Clustering (Page 11)
Open Source Code	No	The paper does not provide any explicit statements or links to its own source code. The only code-related link is to a third-party PyTorch CIFAR-10 model: "Pytorchcifar10. https://github.com/huyvnphan/PyTorch_CIFAR10, 2020."
Open Datasets	Yes	1) Oregon: Dataset of 9 graph snapshots sampled across 3 months from an internet router communication network (Leskovec et al., 2005). 2) PHY: Dataset from KDD cup 2004 (kdd, 2004). 3) CIFAR10: Testing portion of CIFAR-10 (n = 104, dimension 3072) (Krizhevsky, 2009).
Dataset Splits	No	The paper mentions "train/test split" and refers to using "training portion" of datasets (e.g., CIFAR-10), but it does not specify any details regarding a separate validation split or dataset for its experiments.
Hardware Specification	Yes	All of our experiments were done on a CPU with i5 2.7 GHz dual core and 8 GB RAM.
Software Dependencies	No	The paper mentions using the "Py Torch library" for training a neural network in Section E.1, but it does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	We primarily fix the number of clusters to be k = 10 and k = 25 throughout our experiments for all datasets. [...] To do so, we iterate over a small range of possible α from .01 to .15 in Algorithm 2 with a step size of 0.01 and select the clustering that results in the lowest objective cost.