Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sets Clustering

Authors: Ibrahim Jubran, Murad Tukan, Alaa Maalouf, Dan Feldman

ICML 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented our coreset construction, as well as different sets-k-mean solvers. In this section we evaluate their empirical performance. Open source code for future research can be downloaded from (Jubran et al., 2020).
Researcher Affiliation	Academia	1Robotics & Big Data Lab, Department of Computer Science, University of Haifa, Israel. Correspondence to: Ibrahim Jubran <EMAIL>.
Pseudocode	Yes	Algorithm 1 RECURSIVE-ROBUST-MEDIAN(P, k); Algorithm 2 CORESET(P, k, ε, δ); Algorithm 3 MEDIAN(P, k, δ)
Open Source Code	Yes	Open source code and experimental results for document classiﬁcation and facility locations are also provided. (vi): Open source implementation for reproducing our experiments and for future research (Jubran et al., 2020). Link for open-source code.
Open Datasets	Yes	Datasets. (i): The LEHD Origin-Destination Employment Statistics (LODES) (lod). It contains information about people that live and work at the united states. (ii): The Reuters-21578 benchmark corpus (Bird et al., 2009).
Dataset Splits	No	The paper does not explicitly describe specific training, validation, and test splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification	Yes	The algorithms were implemented in Python 3.7.3 using Sage 9.0 (The Sage Developers, 2020) as explained above on a Lenovo Z70 laptop with an Intel i7-5500U CPU @ 2.40GHZ and 16GB RAM.
Software Dependencies	Yes	The algorithms were implemented in Python 3.7.3 using Sage 9.0 (The Sage Developers, 2020) as explained above on a Lenovo Z70 laptop with an Intel i7-5500U CPU @ 2.40GHZ and 16GB RAM.
Experiment Setup	No	The paper describes the algorithms and their implementation, and uses existing heuristics like Lloyd's algorithm, but does not provide specific hyperparameter values or detailed system-level training settings for these heuristics.