Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the price of explainability for some clustering problems

Authors: Eduardo S Laber, Lucas Murtinho

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Another contribution is a simple and efficient algorithm for building explainable clusterings for the k-means problem. We provide empirical evidence that its performance is better than the current state of the art for decision-tree based explainable clustering.
Researcher Affiliation	Academia	1Department of Computer Science, PUC-Rio, Brazil.
Pseudocode	Yes	Algorithm 1 Ex-k Center( X : set of points) ... Algorithm 2 Build Tree(X S ) ... Algorithm 3 Ex-Single Link(X)
Open Source Code	Yes	Our code is availble in https://github.com/lmurtinho/ExKMC.
Open Datasets	Yes	The datasets Iris, Wine, Breast Cancer, Digits, Covtype, Mice and Newsgroup are available in Python s scikit-learn; Cifar-10 is available in Tensor Flow; Anuran and Avila were downloaded from UCI.
Dataset Splits	No	The paper does not provide explicit training, validation, or test dataset splits (e.g., percentages or sample counts). It mentions using datasets and running the KMeans algorithm with default parameters for an initial unrestricted solution.
Hardware Specification	Yes	All our experiments were executed in a Mac Book Air, 8Gb of RAM, processor 1,6 GHz Dual Core Intel Core i5, executing mac OS Catalina, version 10.15.7.
Software Dependencies	No	The paper mentions software like Python's scikit-learn and TensorFlow, but does not specify their version numbers for reproducibility.
Experiment Setup	Yes	For each iteration, we initially achieve an unrestricted solution Cini by running the KMeans algorithm provided in the scikit-klearn package with default parameters.