Near-Optimal Comparison Based Clustering
Authors: Michaël Perrot, Pascal Esser, Debarghya Ghoshdastidar
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data. |
| Researcher Affiliation | Academia | Michaël Perrot Univ Lyon, UJM-Saint-Etienne, CNRS, IOGS, Lab HC UMR 5516, F-42023, SAINT-ETIENNE, France Pascal Mattia Esser , Debarghya Ghoshdastidar Department of Informatics Technical University of Munich |
| Pseudocode | Yes | Algorithm 1: Comparison-based SPUR |
| Open Source Code | Yes | 3We provide a Python implementation on https://github.com/mperrot/Add S-Clustering |
| Open Datasets | Yes | We consider two datasets which are subsets of the MNIST test dataset (Le Cun and Cortes, 2010) that originally contains 10000 examples roughly equally distributed among the ten digits: (i) a subset of 2163 examples containing all the 1 and 7 (MNIST 1vs.7), two digits that are visually very similar, and (ii) a randomly selected subset of 2000 examples drawn without replacement and covering all 10 classes (MNIST 10). Second, we consider the Car dataset (Kleindessner and von Luxburg, 2016). |
| Dataset Splits | No | The paper uses a 'subset of the MNIST test dataset' and 'Simulated data' but does not provide specific training/validation/test dataset splits or methodologies for creating such splits from the raw data for their experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Python implementation' but does not specify version numbers for Python or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | As default parameters we use n = 1000, k = 4, ϵ = 0.75, |T | = |Q| = n(ln n)4 and Fin = N µin , σ2 , Fout = N µout , σ2 with σ = 0.1 and δ = 0.5. The partition is obtained by clustering the rows of Xk using k-means. |