reproducibilityindex.ai

Individual Preference Stability for Clustering

Authors: Saba Ahmadi, Pranjal Awasthi, Samir Khuller, Matthäus Kleindessner, Jamie Morgenstern, Pattara Sukprasert, Ali Vakilian

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate some of our algorithms and several standard clustering approaches on real data sets.
Researcher Affiliation	Collaboration	1Toyota Technological Institute at Chicago, USA 2Google, USA 3Northwestern University, USA 4Amazon Web Services, Germany 5University of Washington, USA.
Pseudocode	Yes	Algorithm 1 provides the pseudocode of our proposed strategy.
Open Source Code	Yes	Code available on https://github.com/ amazon-research/ip-stability-for-clustering
Open Datasets	Yes	We used the Adult data set, the Drug Consumption data set (1885 records), and the Indian Liver Patient data set (579 records), which are all publicly available in the UCI repository (Dua and Graff, 2019).
Dataset Splits	No	The paper describes experimental setup and parameters but does not specify explicit training/validation/test splits with percentages or sample counts. For example, it states: "For k-means++, k-medoids, k-center and k-center GF we show average results obtained from running them for 25 times since their outcomes depend on random initializations.", and for 1-D Euclidean data sets in Appendix H: "We were aiming for k-clusterings with clusters of equal size... and compared our DP approach... with k-means clustering...For the latter we report average results obtained from running the experiment for 100 times."
Hardware Specification	No	The paper does not explicitly describe the hardware used for its experiments. It only mentions: "We performed all experiments in Python."
Software Dependencies	No	The paper mentions software used but does not provide specific version numbers for reproducibility. It states: "We performed all experiments in Python. We used the standard clustering algorithms from Scikit-learn or Sci Py with all parameters set to their default values."
Experiment Setup	Yes	In order to study the extent to which these methods produce (un-)stable clusterings, for k = 2, 5, 10, 15, 20, 30, . . . , 100, we computed # Uns, Max Vi and Mean Vi as deﬁned above for the resulting k-clusterings. For k-means++, k-medoids, k-center and k-center GF we show average results obtained from running them for 25 times since their outcomes depend on random initializations.