reproducibilityindex.ai

Foundations of Comparison-Based Hierarchical Clustering

Authors: Debarghya Ghoshdastidar, Michaël Perrot, Ulrike von Luxburg

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also empirically demonstrate the performance of the proposed approaches on several datasets.
Researcher Affiliation	Academia	Debarghya Ghoshdastidar Department of Informatics, TU Munich ghoshdas@in.tum.de Michaël Perrot Max Planck Institute for Intelligent Systems michael.perrot@tuebingen.mpg.de Ulrike von Luxburg Department of Computer Science, University of Tübingen Max Planck Institute for Intelligent Systems luxburg@informatik.uni-tuebingen.de
Pseudocode	Yes	Algorithm 1: Agglomerative Hierarchical Clustering.
Open Source Code	Yes	The code of our methods is available at https://github.com/mperrot/Comparison HC.
Open Datasets	Yes	We evaluate the different approaches on 3 different datasets commonly used in hierarchical clustering: Zoo, Glass and 20news (Heller and Ghahramani, 2005; Vikram and Dasgupta, 2016).
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits. It mentions using a 'planted hierarchical model' to generate data and applying methods to 'standard datasets' where comparisons are generated, but no explicit split information is given for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions that code is available, but it does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup	Yes	Recall that our generative model has several parameters, the within-cluster mean similarity µ, the variance σ2, the separability constant δ, the depth of the planted partition L and the number of examples in each cluster N0. From the different guarantees presented in Section 3, it is clear that the hardness of the problem depends mainly on the signal-to-noise ratio δ σ, and the probability p of observing samples for the passive methods. Hence, to study the behaviour of the different methods with respect to these two quantities, we set µ = 0.8, σ = 0.1, N0 = 30, and L = 3 and we vary δ {0.02, 0.04, . . . , 0.2} and p {0.01, 0.02, . . . , 0.1, 1}.