reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Axiomatic Definition of Hierarchical Clustering

Authors: Ery Arias-Castro, Elizabeth Coda

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a set of axioms for population hierarchical clustering and show that this axiomatic deﬁnition is well-founded and essentially coincides with the cluster tree introduced by Hartigan (1975). ... The organization of the paper is as follows. Section 2 provides some basic notation and deﬁnitions. In Section 3, we take an axiomatic approach to deﬁning a hierarchical clustering for a piecewise constant density with connected support. In Section 4, we extend this deﬁnition to continuous densities, ﬁrst to densities with connected support, and then to more general densities. Section 5 is a discussion section where we go over some extensions, some practical considerations, and also discuss some outlook on ﬂat clustering. In an appendix, we provide a close examination of the merge distortion metric of Eldridge et al. (2015) (Section A), and provide further technical details for the special case of a Euclidean space (Section B).
Researcher Affiliation	Academia	Ery Arias-Castro EMAIL Department of Mathematics and Halıcıo glu Data Science Institute University of California, San Diego La Jolla, CA 92093, USA. Elizabeth Coda EMAIL Department of Mathematics University of California, San Diego La Jolla, CA 92093, USA.
Pseudocode	No	The paper focuses on theoretical definitions, axioms, and mathematical proofs. It does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and does not present empirical experiments that would utilize specific datasets. Therefore, it does not provide access information for open datasets.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets that would require explicit training/test/validation splits.
Hardware Specification	No	The paper is theoretical and focuses on mathematical concepts and proofs, thus no experimental setup or specific hardware used for running experiments is mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any experimental implementation, thus no software dependencies with specific version numbers are provided.
Experiment Setup	No	The paper is theoretical, presenting definitions, axioms, and proofs rather than experimental results. Therefore, no experimental setup details, such as hyperparameters or training configurations, are mentioned.