Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Clustering with Same-Cluster Queries
Authors: Hassan Ashtiani, Shrinu Kushagra, Shai Ben-David
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of k-means clustering (i.e., when the expert conforms to a solution of k-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems. In particular, we provide a probabilistic polynomial-time (BPP) algorithm for clustering in this setting that asks O k2 log k + k log n) same-cluster queries and runs with time complexity O kn log n). |
| Researcher Affiliation | Academia | Hassan Ashtiani , Shrinu Kushagra and Shai Ben-David David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario, Canada EMAIL |
| Pseudocode | Yes | Algorithm 1: Algorithm for γ(> 1)-margin instances with queries |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and complexity analysis; it does not describe experiments using publicly available datasets or provide access information for any dataset used for training. |
| Dataset Splits | No | The paper is theoretical and does not mention any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe the hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper describes an algorithm theoretically but does not provide specific experimental setup details like hyperparameters or training configurations for an empirical study. |