Hierarchical Clustering via Spreading Metrics

Authors: Aurko Roy, Sebastian Pokutta

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also implement the integer program, its LP relaxation, and the rounding algorithm and test it on some synthetic and real world data sets to compare the cost of the rounded solutions to the true optimum, as well as to compare its performance to other hierarchical clustering algorithms used in practice. Our experiments suggest that the hierarchies found by this algorithm are often better than the ones found by linkage based algorithms as well as the k-means algorithm in terms of the error of the best pruning of the tree compared to the ground truth.
Researcher Affiliation Academia Aurko Roy1 and Sebastian Pokutta2 1College of Computing, Georgia Institute of Technology, Atlanta, GA, USA. Email: aurko@gatech.edu 2ISy E, Georgia Institute of Technology, Atlanta, GA, USA. Email: sebastian.pokutta@isye.gatech.edu
Pseudocode Yes Algorithm 1: Iterative rounding algorithm to find a low cost ultrametric
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes We considered synthetic data sets and some data sets from the UCI database [36].
Dataset Splits No The paper does not provide specific training/validation/test splits, percentages, or details on cross-validation. It only mentions subsampling for larger datasets.
Hardware Specification No The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies No The paper mentions using the "dual simplex method" but does not specify any software names with version numbers, such as libraries, frameworks, or solvers. For example, it does not state which specific dual simplex solver was used (e.g., CPLEX, Gurobi) or its version.
Experiment Setup Yes For the similarity function κ we limited ourselves to using cosine similarity κcos and the Gaussian kernel κgauss with σ = 1. Since Algorithm 1 requires κ 0, in practice we use 1 + κcos instead of κcos.