Local algorithms for interactive clustering

Authors: Pranjal Awasthi, Maria Balcan, Konstantin Voevodski

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a theoretical interactive model and provide strong experimental evidence supporting the practical applicability our algorithms. In Section 5 we demonstrate the effectiveness of our algorithms on real data.
Researcher Affiliation Collaboration Pranjal Awasthi PAWASTHI@CS.CMU.EDU Carnegie Mellon University, Pittsburgh, USA Maria-Florina Balcan NINAMF@CC.GATECH.EDU Georgia Institute of Technology, Atlanta, USA Konstantin Voevodski KVODSKI@GOOGLE.COM Google, NY, USA
Pseudocode Yes Figure 1. Split procedure Algorithm: SPLIT PROCEDURE Input: Cluster Ci, global average-linkage tree Tavg. Figure 2. Merge procedure Algorithm: MERGE PROCEDURE Input: Clusters Ci and Cj, global average-linkage tree Tavg.
Open Source Code No The paper does not provide an explicit statement or link to the authors' own open-source code for their methodology.
Open Datasets Yes newsgroup documents data.1http://people.csail.mit.edu/jrennie/20Newsgroups/
Dataset Splits No The paper describes perturbing ground-truth to create initial clusterings and mentions dataset sizes, but does not specify explicit train/validation/test splits, proportions, or cross-validation settings.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies No The paper does not specify any software dependencies or versions used for the experiments.
Experiment Setup Yes We compute an initial clustering by perturbing the ground-truth. In each iteration, we compute the set of all feasible splits and merges: a split of a cluster is feasible if it contains points from 2 or more ground-truth clusters, and a merge is feasible if at least an ηfraction of points in each cluster are from the same ground-truth cluster. Then, we choose one of the feasible edits uniformly at random, and ask the algorithm to compute the corresponding edit. We continue this process until we find the ground-truth clustering or we reach 20000 iterations. In order to simulate this scenario, when we compute the initial clustering, for each document we keep its ground-truth cluster assignment with probability 0.95, and otherwise reassign it to one of the other clusters, which is chosen uniformly at random.