Improved Learning-augmented Algorithms for k-means and k-medians Clustering
Authors: Thy Dinh Nguyen, Anamay Chaturvedi, Huy Nguyen
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate algorithm 1 and algorithm 2 on real-world datasets. Our experiments were done on a i9-12900KF processor with 32GB RAM. For all experiments, we fix the number of points to be allocated k = 10, and report the average and the standard deviation error of the clustering cost over 20 independent runs. |
| Researcher Affiliation | Academia | Thy Nguyen , Anamay Chaturvedi , Huy Lê Nguy ên Khoury College of Computer Sciences, Northeastern University {nguyen.thy2,chaturvedi.a,hu.nguyen}@northeastern.edu |
| Pseudocode | Yes | Algorithm 1 Deterministic Learning-augmented k-Means Clustering Algorithm 2 Learning-augmented k-Medians Clustering |
| Open Source Code | Yes | The repository is hosted at github.com/thydnguyen/LA-Clustering. |
| Open Datasets | Yes | We test the algorithms on the testing set of the CIFAR-10 dataset (Krizhevsky et al., 2009) (m = 104, d = 3072), the PHY dataset from KDD Cup 2004 (KDD Cup 2004), and the MNIST dataset (Deng, 2012) (m = 1797, d = 64). |
| Dataset Splits | No | The paper mentions using 'testing set' for evaluation but does not provide specific details on the dataset splits (e.g., percentages, sample counts for train/validation/test, or explicit references to predefined splits with full details). |
| Hardware Specification | Yes | Our experiments were done on a i9-12900KF processor with 32GB RAM. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | For all experiments, we fix the number of points to be allocated k = 10, and report the average and the standard deviation error of the clustering cost over 20 independent runs. For algorithm 2, we can treat the number of rounds R as a hyperparameter. We set R = 1; as shown below, this is already enough to achieve a good performance compared to the other approaches. |