Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sets Clustering
Authors: Ibrahim Jubran, Murad Tukan, Alaa Maalouf, Dan Feldman
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented our coreset construction, as well as different sets-k-mean solvers. In this section we evaluate their empirical performance. Open source code for future research can be downloaded from (Jubran et al., 2020). |
| Researcher Affiliation | Academia | 1Robotics & Big Data Lab, Department of Computer Science, University of Haifa, Israel. Correspondence to: Ibrahim Jubran <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 RECURSIVE-ROBUST-MEDIAN(P, k); Algorithm 2 CORESET(P, k, ε, δ); Algorithm 3 MEDIAN(P, k, δ) |
| Open Source Code | Yes | Open source code and experimental results for document classification and facility locations are also provided. (vi): Open source implementation for reproducing our experiments and for future research (Jubran et al., 2020). Link for open-source code. |
| Open Datasets | Yes | Datasets. (i): The LEHD Origin-Destination Employment Statistics (LODES) (lod). It contains information about people that live and work at the united states. (ii): The Reuters-21578 benchmark corpus (Bird et al., 2009). |
| Dataset Splits | No | The paper does not explicitly describe specific training, validation, and test splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | Yes | The algorithms were implemented in Python 3.7.3 using Sage 9.0 (The Sage Developers, 2020) as explained above on a Lenovo Z70 laptop with an Intel i7-5500U CPU @ 2.40GHZ and 16GB RAM. |
| Software Dependencies | Yes | The algorithms were implemented in Python 3.7.3 using Sage 9.0 (The Sage Developers, 2020) as explained above on a Lenovo Z70 laptop with an Intel i7-5500U CPU @ 2.40GHZ and 16GB RAM. |
| Experiment Setup | No | The paper describes the algorithms and their implementation, and uses existing heuristics like Lloyd's algorithm, but does not provide specific hyperparameter values or detailed system-level training settings for these heuristics. |