Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the cohesion and separability of average-link for hierarchical agglomerative clustering

Authors: Eduardo Laber, Miguel Batista

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also present experimental results with real datasets that, together with our theoretical analyses, suggest that average-link is a better choice than other related methods when both cohesion and separability are important goals. Finally, to complement our study, we present some experiments with 10 real datasets in which we evaluate, to some extent, if our theoretical results line up with what is observed in practice.
Researcher Affiliation	Academia	Eduardo S. Laber Departmento de Informática, PUC-RIO EMAIL Miguel Batista Departmento de Informática, PUC-RIO EMAIL
Pseudocode	Yes	Algorithm 2 shows a pseudo-code for average-link.
Open Source Code	Yes	Our supplementary material contains our codes.
Open Datasets	Yes	We employed 10 datasets and used the Euclidean metric to measure distances. For each of them, we executed average-link, complete-linkage and single-linkage, for the following sets of values of k: Small={k\|2 k 10}, Medium={k\| n 4 k n + 4} and Large={k\|k = n/i and 2 i 10}. More details, as well as the results of our experiment with other distances, can be found in Section F. (Section F lists datasets with academic citations, e.g., 'Airfoil 1501 5 Brooks and Marcolini [2014]')
Dataset Splits	No	The paper mentions evaluating methods for different ranges of 'k' (Small, Medium, Large) and uses 10 datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or type of compute workers. In the NeurIPS checklist, the authors state this information is 'irrelevant to reproducing our experiments or reaching our conclusions'.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or specialized solvers) that were used to conduct the experiments.
Experiment Setup	No	The paper mentions using the Euclidean metric and testing for different ranges of 'k' values. However, it does not provide specific details about experimental setup, such as hyperparameter values, initialization methods, or other system-level training settings.