Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An Efficient and Effective Generic Agglomerative Hierarchical Clustering Approach

Authors: Julien Ah-Pine

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Artiﬁcial and real-world benchmarks are used to exemplify these points. From a theoretical standpoint, SNK-AHC provides another interpretation of the classic techniques which relies on the concept of weighted penalized similarities. ... Section 6 is dedicated to the experiments which are carried out on both artiﬁcial and real-world data sets.
Researcher Affiliation	Academia	Julien Ah-Pine EMAIL University of Lyon, Lyon 2 ERIC EA3083 5 avenue Pierre Mend es France 69676 Bron Cedex, France
Pseudocode	Yes	Algorithm 1: General procedure of D-AHC. Algorithm 2: General procedure of K-AHC. Algorithm 3: General procedure of the K-AHC based stored data matrix approach. Algorithm 4: General procedure of SNK-AHC. Algorithm 5: Connected components determination.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the methodology described, nor does it include links to any code repositories.
Open Datasets	Yes	We both use artiﬁcial and real-world problems which are freely available at (Franti and et al, 2015) and (Lichman, 2013) respectively. ... The ﬁrst collection is called the landsat data set https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite) ... The second collection we used, is called the pendigits data set https://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits
Dataset Splits	No	For each obtained dendrogram, we cut the forest so as to obtain the correct number of clusters denoted κ . Note that if κ, the number of clusters found by Algorithm 4, is greater than κ then, we keep the partition with κ clusters. Afterward, we compare the resulting partition and the ground-truth. The evaluation measure used in this case is the famous adjusted Rand index (Hubert and Arabie, 1985) which is denoted ARI. The paper describes a clustering evaluation methodology against a known ground truth, but does not specify train/test/validation splits typically used for supervised learning models.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software names with version numbers for the implementation of the described methodology. It only mentions 'popular SVM tools like (Chang and Lin, 2011)' in the context of a default setting for a Gaussian kernel, but not as software used by the authors with a version.
Experiment Setup	Yes	Regarding the Gaussian kernel, we remind its deﬁnition below: Sab = exp( γ xa xb 2), a, b O We set γ = 1/q, q being the number of descriptive variables. ... Concerning NNk, the distinct k values were successively set to (the nearest integer of) {100, 90, 75, 50, 25, 10, 1} percent of n, the total number of items. ... the sparsiﬁcation method we used here is based on a threshold following (26). The diﬀerent θ values were chosen so that a certain level of sparsity is reached. Precisely, they correspond to the {100, 90, 75, 50, 25, 10, 1}th percentiles of the similarity values distribution.