Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hierarchical Clustering via Spreading Metrics

Authors: Aurko Roy, Sebastian Pokutta

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the hierarchies found by using the ILP formulation as well as our rounding algorithm often have better projections into flat clusters than the standard linkage based algorithms. We also implement the integer program, its LP relaxation, and the rounding algorithm and test it on some synthetic and real world data sets to compare the cost of the rounded solutions to the true optimum as well as to compare its performance to other hierarchical clustering algorithms used in practice.
Researcher Affiliation Academia Aurko Roy EMAIL College of Computing Georgia Institute of Technology Atlanta, GA 30332, USA Sebastian Pokutta EMAIL ISy E Georgia Institute of Technology Atlanta, GA 30332, USA
Pseudocode Yes Algorithm 1: Hierarchical clustering of V from non-trivial ultrametric ... Algorithm 2: Iterative rounding algorithm to find a low cost ultrametric ... Algorithm 3: Hierarchical clustering of V for cost function (1) ... Algorithm 4: Hierarchical clustering of V for cost function (21)
Open Source Code No The paper states: "We implement f-ILP-ultrametric where one can plug in any strictly increasing function f satisfying f (0) = 0. In particular, setting f (x) = x gives us ILP-ultrametric. We use the Mixed Integer Programming (MIP) solver Gurobi 6.5 (Gurobi Optimization, 2015). Similarly, we also implement Algorithms 1, 2, and 4 using Gurobi as our LP solver." However, there is no explicit statement about making their implementation code publicly available, nor is a link provided.
Open Datasets Yes For the first question, we are restricted to working with small data sets since computing an optimum solution to f-ILP-ultrametric is expensive. In this case we consider synthetic data sets of small size and samples of some data sets from the UCI database (Lichman, 2013). ... For comparison we use a mix of synthetic data sets as well as the Wine, Iris, Soybean-small, Digits, Glass, and Wdbc data sets from the UCI repository (Lichman, 2013).
Dataset Splits No For some of the larger data sets, we sample uniformly at random a smaller number of data points and take the average of the error over the different runs. The paper mentions sampling but does not specify train/test/validation splits or percentages for reproducing the exact partitioning of the datasets used for experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions using specific software solvers like Gurobi.
Software Dependencies Yes We use the Mixed Integer Programming (MIP) solver Gurobi 6.5 (Gurobi Optimization, 2015).
Experiment Setup Yes For the sake of exposition, we limit ourselves to the following choices for the function f x, x2, log(1 + x), ex 1 . ... For any t [n 1] and U V we denote by γU t the value of the layer-t objective for solution dt restricted to the set U, i.e., ... For the similarity function κ : V V R we limit ourselves to using cosine similarity and the Gaussian kernel with σ = 1. ... In our experiments we start with a particular value of ε (say 0.5) and halve it till the volumes have the same sign.