Sets Clustering

Authors: Ibrahim Jubran, Murad Tukan, Alaa Maalouf, Dan Feldman

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented our coreset construction, as well as different sets-k-mean solvers. In this section we evaluate their empirical performance. Open source code for future research can be downloaded from (Jubran et al., 2020).
Researcher Affiliation Academia 1Robotics & Big Data Lab, Department of Computer Science, University of Haifa, Israel. Correspondence to: Ibrahim Jubran <ibrahim.jub@gmail.com>.
Pseudocode Yes Algorithm 1 RECURSIVE-ROBUST-MEDIAN(P, k); Algorithm 2 CORESET(P, k, ε, δ); Algorithm 3 MEDIAN(P, k, δ)
Open Source Code Yes Open source code and experimental results for document classification and facility locations are also provided. (vi): Open source implementation for reproducing our experiments and for future research (Jubran et al., 2020). Link for open-source code.
Open Datasets Yes Datasets. (i): The LEHD Origin-Destination Employment Statistics (LODES) (lod). It contains information about people that live and work at the united states. (ii): The Reuters-21578 benchmark corpus (Bird et al., 2009).
Dataset Splits No The paper does not explicitly describe specific training, validation, and test splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification Yes The algorithms were implemented in Python 3.7.3 using Sage 9.0 (The Sage Developers, 2020) as explained above on a Lenovo Z70 laptop with an Intel i7-5500U CPU @ 2.40GHZ and 16GB RAM.
Software Dependencies Yes The algorithms were implemented in Python 3.7.3 using Sage 9.0 (The Sage Developers, 2020) as explained above on a Lenovo Z70 laptop with an Intel i7-5500U CPU @ 2.40GHZ and 16GB RAM.
Experiment Setup No The paper describes the algorithms and their implementation, and uses existing heuristics like Lloyd's algorithm, but does not provide specific hyperparameter values or detailed system-level training settings for these heuristics.