Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improved Learning-augmented Algorithms for k-means and k-medians Clustering
Authors: Thy Dinh Nguyen, Anamay Chaturvedi, Huy Nguyen
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate algorithm 1 and algorithm 2 on real-world datasets. Our experiments were done on a i9-12900KF processor with 32GB RAM. For all experiments, we fix the number of points to be allocated k = 10, and report the average and the standard deviation error of the clustering cost over 20 independent runs. |
| Researcher Affiliation | Academia | Thy Nguyen , Anamay Chaturvedi , Huy Lê Nguy ên Khoury College of Computer Sciences, Northeastern University EMAIL |
| Pseudocode | Yes | Algorithm 1 Deterministic Learning-augmented k-Means Clustering Algorithm 2 Learning-augmented k-Medians Clustering |
| Open Source Code | Yes | The repository is hosted at github.com/thydnguyen/LA-Clustering. |
| Open Datasets | Yes | We test the algorithms on the testing set of the CIFAR-10 dataset (Krizhevsky et al., 2009) (m = 104, d = 3072), the PHY dataset from KDD Cup 2004 (KDD Cup 2004), and the MNIST dataset (Deng, 2012) (m = 1797, d = 64). |
| Dataset Splits | No | The paper mentions using 'testing set' for evaluation but does not provide specific details on the dataset splits (e.g., percentages, sample counts for train/validation/test, or explicit references to predefined splits with full details). |
| Hardware Specification | Yes | Our experiments were done on a i9-12900KF processor with 32GB RAM. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | For all experiments, we fix the number of points to be allocated k = 10, and report the average and the standard deviation error of the clustering cost over 20 independent runs. For algorithm 2, we can treat the number of rounds R as a hyperparameter. We set R = 1; as shown below, this is already enough to achieve a good performance compared to the other approaches. |