Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improved Learning-augmented Algorithms for k-means and k-medians Clustering

Authors: Thy Dinh Nguyen, Anamay Chaturvedi, Huy Nguyen

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate algorithm 1 and algorithm 2 on real-world datasets. Our experiments were done on a i9-12900KF processor with 32GB RAM. For all experiments, we fix the number of points to be allocated k = 10, and report the average and the standard deviation error of the clustering cost over 20 independent runs.
Researcher Affiliation Academia Thy Nguyen , Anamay Chaturvedi , Huy Lê Nguy ên Khoury College of Computer Sciences, Northeastern University EMAIL
Pseudocode Yes Algorithm 1 Deterministic Learning-augmented k-Means Clustering Algorithm 2 Learning-augmented k-Medians Clustering
Open Source Code Yes The repository is hosted at github.com/thydnguyen/LA-Clustering.
Open Datasets Yes We test the algorithms on the testing set of the CIFAR-10 dataset (Krizhevsky et al., 2009) (m = 104, d = 3072), the PHY dataset from KDD Cup 2004 (KDD Cup 2004), and the MNIST dataset (Deng, 2012) (m = 1797, d = 64).
Dataset Splits No The paper mentions using 'testing set' for evaluation but does not provide specific details on the dataset splits (e.g., percentages, sample counts for train/validation/test, or explicit references to predefined splits with full details).
Hardware Specification Yes Our experiments were done on a i9-12900KF processor with 32GB RAM.
Software Dependencies No The paper mentions using
Experiment Setup Yes For all experiments, we fix the number of points to be allocated k = 10, and report the average and the standard deviation error of the clustering cost over 20 independent runs. For algorithm 2, we can treat the number of rounds R as a hyperparameter. We set R = 1; as shown below, this is already enough to achieve a good performance compared to the other approaches.