Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++
Authors: Dennis Wei
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper studies the k-means++ algorithm for clustering as well as the class of Dℓ sampling algorithms to which k-means++ belongs. It is shown that for any constant factor β > 1, selecting βk cluster centers by Dℓsampling yields a constant-factor approximation to the optimal clustering with k centers, in expectation and without conditions on the dataset. This result extends the previously known O(log k) guarantee for the case β = 1 to the constant-factor bi-criteria regime. It also improves upon an existing constant-factor bi-criteria result that holds only with constant probability. |
| Researcher Affiliation | Industry | Dennis Wei IBM Research Yorktown Heights, NY 10598, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 DℓSampling Input: Data points x1, . . . , xn, number of clusters t. Initialize φ(xi) = 1 for i = 1, . . . , n. for j = 1 to t do Select jth center cj = xi with probability φ(xi)/φ. Update φ(xi) for i = 1, . . . , n. |
| Open Source Code | No | The paper is theoretical and does not mention providing open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not mention the use of any specific publicly available datasets for training or provide access information. |
| Dataset Splits | No | The paper is theoretical and does not describe any specific training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details, such as hyperparameters or system-level training settings. |