Foundations of Comparison-Based Hierarchical Clustering

Authors: Debarghya Ghoshdastidar, Michaël Perrot, Ulrike von Luxburg

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also empirically demonstrate the performance of the proposed approaches on several datasets.
Researcher Affiliation Academia Debarghya Ghoshdastidar Department of Informatics, TU Munich ghoshdas@in.tum.de Michaël Perrot Max Planck Institute for Intelligent Systems michael.perrot@tuebingen.mpg.de Ulrike von Luxburg Department of Computer Science, University of Tübingen Max Planck Institute for Intelligent Systems luxburg@informatik.uni-tuebingen.de
Pseudocode Yes Algorithm 1: Agglomerative Hierarchical Clustering.
Open Source Code Yes The code of our methods is available at https://github.com/mperrot/Comparison HC.
Open Datasets Yes We evaluate the different approaches on 3 different datasets commonly used in hierarchical clustering: Zoo, Glass and 20news (Heller and Ghahramani, 2005; Vikram and Dasgupta, 2016).
Dataset Splits No The paper does not provide specific train/validation/test dataset splits. It mentions using a 'planted hierarchical model' to generate data and applying methods to 'standard datasets' where comparisons are generated, but no explicit split information is given for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions that code is available, but it does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup Yes Recall that our generative model has several parameters, the within-cluster mean similarity µ, the variance σ2, the separability constant δ, the depth of the planted partition L and the number of examples in each cluster N0. From the different guarantees presented in Section 3, it is clear that the hardness of the problem depends mainly on the signal-to-noise ratio δ σ, and the probability p of observing samples for the passive methods. Hence, to study the behaviour of the different methods with respect to these two quantities, we set µ = 0.8, σ = 0.1, N0 = 30, and L = 3 and we vary δ {0.02, 0.04, . . . , 0.2} and p {0.01, 0.02, . . . , 0.1, 1}.