Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Ultrametric Cluster Hierarchies: I Want ‘em All!

Authors: Andrew Draganov, Pascal Weber, Rasmus Jørgensen, Anna Beer, Claudia Plant, Ira Assent

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude by verifying the utility of our proposed techniques across datasets, hierarchies, and partitioning schemes. Our extensive experiments conﬁrm this is not a cherry-picked example; many SHi P-derived combinations consistently produce novel, high-quality clusterings that compete with or outperform state-of-the-art algorithms with minimal additional computational cost.
Researcher Affiliation	Academia	1Department of Computer Science, Aarhus University, Aarhus, Denmark 2IAS-8: Data Analytics and Machine Learning, Forschungszentrum Jülich, Germany 3Faculty of Computer Science, University of Vienna, Vienna, Austria 4Uni Vie Doctoral School Computer Science, University of Vienna, Vienna, Austria 5Data Science @ Uni Vienna, University of Vienna, Vienna, Austria EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 LCA-tree Farthest First Traversal Algorithm 2 Corresponding Centers Algorithm 3 Ultrametric-k Center Algorithm 4 Corresponding-z-Centers Algorithm 5 Ultrametric-kz Algorithm 6 Best Clustering Algorithm 7 k-centroid-annotation Algorithm 8 k-centroid-hierarchy Algorithm 9 k-centroid-cluster Algorithm 10 Optimize Annotations
Open Source Code	Yes	https://github.com/pasiweber/SHi P-framework/ Implementation and experiments https://pypi.org/project/SHi P-framework/ Python interface package
Open Datasets	Yes	Table 4 lists the datasets on which we validated and compared the proposed framework SHi P to other competitors. Mice [50], HAR [50], letterrec. [50], Pendigits [50], COIL20 [60], COIL100 [59], cmu_faces [50], Optdigits [50], USPS [37], MNIST [46].
Dataset Splits	No	Table 4 lists the datasets on which we validated and compared the proposed framework SHi P to other competitors. We evaluate the clustering quality with the adjusted rand index (ARI) [36], treating points labeled as noise as singleton clusters. NMI [70], AMI [74] and correlation coefﬁcient [29] results can be found in Appendix E. Both our runtime and accuracy4 tables report the mean over 10 runs. All experiments were performed on 2x Intel 6326 with 16 cores each and 512GB RAM.
Hardware Specification	Yes	All experiments were performed on 2x Intel 6326 with 16 cores each and 512GB RAM.
Software Dependencies	Yes	We build KD trees, Cover trees using the C++ library mlpack4 [20], and build HST-DPO trees using the C++ code from https://github.com/yzengal/ICDE21-HST.
Experiment Setup	Yes	We default to µ = 5 for all experiments. The paper details the experimental settings including dataset characteristics (Table 4), implementation details (Section 5.3), parameter settings (e.g., µ=5 for dc-dist), and evaluation metrics (ARI, NMI, AMI, correlation coefﬁcient).