Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Clustering Stable Instances of Euclidean k-means.
Authors: Aravindan Vijayaraghavan, Abhratanu Dutta, Alex Wang
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Algorithm 3.1 on multiple real world datasets and compare its performance to the performance of k-means++, and also check how well these datasets satisfy our geometric conditions. Table 1: Comparison of k-means cost for Alg 3.1 and k-means++ |
| Researcher Affiliation | Academia | Abhratanu Dutta Northwestern University EMAIL Aravindan Vijayaraghavan Northwestern University EMAIL Alex Wang Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 3.1. Input: X = { x1, . . . , xn }, k. 1: for all pairs a, b of distinct points in { xi } do 2: Let r = a b be our guess for ρ 3: procedure INITIALIZE 4: Create graph G on vertex set { x1, . . . , xn } where xi and xj have an edge iff xi xj < r 5: Let a1, . . . , ak Rd where ai is the mean of the ith largest connected component of G 6: procedure ASSIGN 7: Let C1, . . . , Ck be the clusters obtained by assigning each point in X to the closest ai 8: Calculate the k-means objective of C1, . . . , Ck 9: Return clustering with smallest k-means objective found above |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Experiments were run on unnormalized and normalized versions of four labeled datasets from the UCI Machine Learning Repository: Wine (n = 178, k = 3, d = 13), Iris (n = 150, k = 3, d = 4), Banknote Authentication (n = 1372, k = 2, d = 5), and Letter Recognition (n = 20, 000, k = 26, d = 16). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper does not contain specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text. |