Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Hierarchical Clustering via Spreading Metrics
Authors: Aurko Roy, Sebastian Pokutta
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also implement the integer program, its LP relaxation, and the rounding algorithm and test it on some synthetic and real world data sets to compare the cost of the rounded solutions to the true optimum, as well as to compare its performance to other hierarchical clustering algorithms used in practice. Our experiments suggest that the hierarchies found by this algorithm are often better than the ones found by linkage based algorithms as well as the k-means algorithm in terms of the error of the best pruning of the tree compared to the ground truth. |
| Researcher Affiliation | Academia | Aurko Roy1 and Sebastian Pokutta2 1College of Computing, Georgia Institute of Technology, Atlanta, GA, USA. Email: EMAIL 2ISy E, Georgia Institute of Technology, Atlanta, GA, USA. Email: EMAIL |
| Pseudocode | Yes | Algorithm 1: Iterative rounding algorithm to find a low cost ultrametric |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We considered synthetic data sets and some data sets from the UCI database [36]. |
| Dataset Splits | No | The paper does not provide specific training/validation/test splits, percentages, or details on cross-validation. It only mentions subsampling for larger datasets. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "dual simplex method" but does not specify any software names with version numbers, such as libraries, frameworks, or solvers. For example, it does not state which specific dual simplex solver was used (e.g., CPLEX, Gurobi) or its version. |
| Experiment Setup | Yes | For the similarity function κ we limited ourselves to using cosine similarity κcos and the Gaussian kernel κgauss with σ = 1. Since Algorithm 1 requires κ 0, in practice we use 1 + κcos instead of κcos. |