Clustering with Same-Cluster Queries

Authors: Hassan Ashtiani, Shrinu Kushagra, Shai Ben-David

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of k-means clustering (i.e., when the expert conforms to a solution of k-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems. In particular, we provide a probabilistic polynomial-time (BPP) algorithm for clustering in this setting that asks O k2 log k + k log n) same-cluster queries and runs with time complexity O kn log n).
Researcher Affiliation Academia Hassan Ashtiani , Shrinu Kushagra and Shai Ben-David David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario, Canada {mhzokaei,skushagr,shai}@uwaterloo.ca
Pseudocode Yes Algorithm 1: Algorithm for γ(> 1)-margin instances with queries
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets No The paper is theoretical and focuses on algorithm design and complexity analysis; it does not describe experiments using publicly available datasets or provide access information for any dataset used for training.
Dataset Splits No The paper is theoretical and does not mention any training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe the hardware used to run experiments.
Software Dependencies No The paper is theoretical and does not list specific software dependencies with version numbers.
Experiment Setup No The paper describes an algorithm theoretically but does not provide specific experimental setup details like hyperparameters or training configurations for an empirical study.