Individual Preference Stability for Clustering

Authors: Saba Ahmadi, Pranjal Awasthi, Samir Khuller, Matthäus Kleindessner, Jamie Morgenstern, Pattara Sukprasert, Ali Vakilian

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate some of our algorithms and several standard clustering approaches on real data sets.
Researcher Affiliation Collaboration 1Toyota Technological Institute at Chicago, USA 2Google, USA 3Northwestern University, USA 4Amazon Web Services, Germany 5University of Washington, USA.
Pseudocode Yes Algorithm 1 provides the pseudocode of our proposed strategy.
Open Source Code Yes Code available on https://github.com/ amazon-research/ip-stability-for-clustering
Open Datasets Yes We used the Adult data set, the Drug Consumption data set (1885 records), and the Indian Liver Patient data set (579 records), which are all publicly available in the UCI repository (Dua and Graff, 2019).
Dataset Splits No The paper describes experimental setup and parameters but does not specify explicit training/validation/test splits with percentages or sample counts. For example, it states: "For k-means++, k-medoids, k-center and k-center GF we show average results obtained from running them for 25 times since their outcomes depend on random initializations.", and for 1-D Euclidean data sets in Appendix H: "We were aiming for k-clusterings with clusters of equal size... and compared our DP approach... with k-means clustering...For the latter we report average results obtained from running the experiment for 100 times."
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments. It only mentions: "We performed all experiments in Python."
Software Dependencies No The paper mentions software used but does not provide specific version numbers for reproducibility. It states: "We performed all experiments in Python. We used the standard clustering algorithms from Scikit-learn or Sci Py with all parameters set to their default values."
Experiment Setup Yes In order to study the extent to which these methods produce (un-)stable clusterings, for k = 2, 5, 10, 15, 20, 30, . . . , 100, we computed # Uns, Max Vi and Mean Vi as defined above for the resulting k-clusterings. For k-means++, k-medoids, k-center and k-center GF we show average results obtained from running them for 25 times since their outcomes depend on random initializations.