Improved Learning-augmented Algorithms for k-means and k-medians Clustering

Authors: Thy Dinh Nguyen, Anamay Chaturvedi, Huy Nguyen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate algorithm 1 and algorithm 2 on real-world datasets. Our experiments were done on a i9-12900KF processor with 32GB RAM. For all experiments, we fix the number of points to be allocated k = 10, and report the average and the standard deviation error of the clustering cost over 20 independent runs.
Researcher Affiliation Academia Thy Nguyen , Anamay Chaturvedi , Huy Lê Nguy ên Khoury College of Computer Sciences, Northeastern University {nguyen.thy2,chaturvedi.a,hu.nguyen}@northeastern.edu
Pseudocode Yes Algorithm 1 Deterministic Learning-augmented k-Means Clustering Algorithm 2 Learning-augmented k-Medians Clustering
Open Source Code Yes The repository is hosted at github.com/thydnguyen/LA-Clustering.
Open Datasets Yes We test the algorithms on the testing set of the CIFAR-10 dataset (Krizhevsky et al., 2009) (m = 104, d = 3072), the PHY dataset from KDD Cup 2004 (KDD Cup 2004), and the MNIST dataset (Deng, 2012) (m = 1797, d = 64).
Dataset Splits No The paper mentions using 'testing set' for evaluation but does not provide specific details on the dataset splits (e.g., percentages, sample counts for train/validation/test, or explicit references to predefined splits with full details).
Hardware Specification Yes Our experiments were done on a i9-12900KF processor with 32GB RAM.
Software Dependencies No The paper mentions using
Experiment Setup Yes For all experiments, we fix the number of points to be allocated k = 10, and report the average and the standard deviation error of the clustering cost over 20 independent runs. For algorithm 2, we can treat the number of rounds R as a hyperparameter. We set R = 1; as shown below, this is already enough to achieve a good performance compared to the other approaches.