Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Axiomatic Definition of Hierarchical Clustering

Authors: Ery Arias-Castro, Elizabeth Coda

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a set of axioms for population hierarchical clustering and show that this axiomatic definition is well-founded and essentially coincides with the cluster tree introduced by Hartigan (1975). ... The organization of the paper is as follows. Section 2 provides some basic notation and definitions. In Section 3, we take an axiomatic approach to defining a hierarchical clustering for a piecewise constant density with connected support. In Section 4, we extend this definition to continuous densities, first to densities with connected support, and then to more general densities. Section 5 is a discussion section where we go over some extensions, some practical considerations, and also discuss some outlook on flat clustering. In an appendix, we provide a close examination of the merge distortion metric of Eldridge et al. (2015) (Section A), and provide further technical details for the special case of a Euclidean space (Section B).
Researcher Affiliation Academia Ery Arias-Castro EMAIL Department of Mathematics and Halıcıo glu Data Science Institute University of California, San Diego La Jolla, CA 92093, USA. Elizabeth Coda EMAIL Department of Mathematics University of California, San Diego La Jolla, CA 92093, USA.
Pseudocode No The paper focuses on theoretical definitions, axioms, and mathematical proofs. It does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and does not present empirical experiments that would utilize specific datasets. Therefore, it does not provide access information for open datasets.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with datasets that would require explicit training/test/validation splits.
Hardware Specification No The paper is theoretical and focuses on mathematical concepts and proofs, thus no experimental setup or specific hardware used for running experiments is mentioned.
Software Dependencies No The paper is theoretical and does not describe any experimental implementation, thus no software dependencies with specific version numbers are provided.
Experiment Setup No The paper is theoretical, presenting definitions, axioms, and proofs rather than experimental results. Therefore, no experimental setup details, such as hyperparameters or training configurations, are mentioned.