Near-Optimal Comparison Based Clustering

Authors: Michaël Perrot, Pascal Esser, Debarghya Ghoshdastidar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data.
Researcher Affiliation Academia Michaël Perrot Univ Lyon, UJM-Saint-Etienne, CNRS, IOGS, Lab HC UMR 5516, F-42023, SAINT-ETIENNE, France Pascal Mattia Esser , Debarghya Ghoshdastidar Department of Informatics Technical University of Munich
Pseudocode Yes Algorithm 1: Comparison-based SPUR
Open Source Code Yes 3We provide a Python implementation on https://github.com/mperrot/Add S-Clustering
Open Datasets Yes We consider two datasets which are subsets of the MNIST test dataset (Le Cun and Cortes, 2010) that originally contains 10000 examples roughly equally distributed among the ten digits: (i) a subset of 2163 examples containing all the 1 and 7 (MNIST 1vs.7), two digits that are visually very similar, and (ii) a randomly selected subset of 2000 examples drawn without replacement and covering all 10 classes (MNIST 10). Second, we consider the Car dataset (Kleindessner and von Luxburg, 2016).
Dataset Splits No The paper uses a 'subset of the MNIST test dataset' and 'Simulated data' but does not provide specific training/validation/test dataset splits or methodologies for creating such splits from the raw data for their experiments.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'Python implementation' but does not specify version numbers for Python or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes As default parameters we use n = 1000, k = 4, ϵ = 0.75, |T | = |Q| = n(ln n)4 and Fin = N µin , σ2 , Fout = N µout , σ2 with σ = 0.1 and δ = 0.5. The partition is obtained by clustering the rows of Xk using k-means.