On the cohesion and separability of average-link for hierarchical agglomerative clustering

Authors: Eduardo Laber, Miguel Batista

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also present experimental results with real datasets that, together with our theoretical analyses, suggest that average-link is a better choice than other related methods when both cohesion and separability are important goals. Finally, to complement our study, we present some experiments with 10 real datasets in which we evaluate, to some extent, if our theoretical results line up with what is observed in practice.
Researcher Affiliation Academia Eduardo S. Laber Departmento de Informática, PUC-RIO laber@inf.puc-rio.br Miguel Batista Departmento de Informática, PUC-RIO miguel260503@gmail.com
Pseudocode Yes Algorithm 2 shows a pseudo-code for average-link.
Open Source Code Yes Our supplementary material contains our codes.
Open Datasets Yes We employed 10 datasets and used the Euclidean metric to measure distances. For each of them, we executed average-link, complete-linkage and single-linkage, for the following sets of values of k: Small={k|2 k 10}, Medium={k| n 4 k n + 4} and Large={k|k = n/i and 2 i 10}. More details, as well as the results of our experiment with other distances, can be found in Section F. (Section F lists datasets with academic citations, e.g., 'Airfoil 1501 5 Brooks and Marcolini [2014]')
Dataset Splits No The paper mentions evaluating methods for different ranges of 'k' (Small, Medium, Large) and uses 10 datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or type of compute workers. In the NeurIPS checklist, the authors state this information is 'irrelevant to reproducing our experiments or reaching our conclusions'.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or specialized solvers) that were used to conduct the experiments.
Experiment Setup No The paper mentions using the Euclidean metric and testing for different ranges of 'k' values. However, it does not provide specific details about experimental setup, such as hyperparameter values, initialization methods, or other system-level training settings.