reproducibilityindex.ai

Near-Optimal Comparison Based Clustering

Authors: Michaël Perrot, Pascal Esser, Debarghya Ghoshdastidar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical ﬁndings and demonstrate the good behaviour of our method on real data.
Researcher Affiliation	Academia	Michaël Perrot Univ Lyon, UJM-Saint-Etienne, CNRS, IOGS, Lab HC UMR 5516, F-42023, SAINT-ETIENNE, France Pascal Mattia Esser , Debarghya Ghoshdastidar Department of Informatics Technical University of Munich
Pseudocode	Yes	Algorithm 1: Comparison-based SPUR
Open Source Code	Yes	3We provide a Python implementation on https://github.com/mperrot/Add S-Clustering
Open Datasets	Yes	We consider two datasets which are subsets of the MNIST test dataset (Le Cun and Cortes, 2010) that originally contains 10000 examples roughly equally distributed among the ten digits: (i) a subset of 2163 examples containing all the 1 and 7 (MNIST 1vs.7), two digits that are visually very similar, and (ii) a randomly selected subset of 2000 examples drawn without replacement and covering all 10 classes (MNIST 10). Second, we consider the Car dataset (Kleindessner and von Luxburg, 2016).
Dataset Splits	No	The paper uses a 'subset of the MNIST test dataset' and 'Simulated data' but does not provide specific training/validation/test dataset splits or methodologies for creating such splits from the raw data for their experiments.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions 'Python implementation' but does not specify version numbers for Python or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	As default parameters we use n = 1000, k = 4, ϵ = 0.75, \|T \| = \|Q\| = n(ln n)4 and Fin = N µin , σ2 , Fout = N µout , σ2 with σ = 0.1 and δ = 0.5. The partition is obtained by clustering the rows of Xk using k-means.