Individual Preference Stability for Clustering
Authors: Saba Ahmadi, Pranjal Awasthi, Samir Khuller, Matthäus Kleindessner, Jamie Morgenstern, Pattara Sukprasert, Ali Vakilian
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate some of our algorithms and several standard clustering approaches on real data sets. |
| Researcher Affiliation | Collaboration | 1Toyota Technological Institute at Chicago, USA 2Google, USA 3Northwestern University, USA 4Amazon Web Services, Germany 5University of Washington, USA. |
| Pseudocode | Yes | Algorithm 1 provides the pseudocode of our proposed strategy. |
| Open Source Code | Yes | Code available on https://github.com/ amazon-research/ip-stability-for-clustering |
| Open Datasets | Yes | We used the Adult data set, the Drug Consumption data set (1885 records), and the Indian Liver Patient data set (579 records), which are all publicly available in the UCI repository (Dua and Graff, 2019). |
| Dataset Splits | No | The paper describes experimental setup and parameters but does not specify explicit training/validation/test splits with percentages or sample counts. For example, it states: "For k-means++, k-medoids, k-center and k-center GF we show average results obtained from running them for 25 times since their outcomes depend on random initializations.", and for 1-D Euclidean data sets in Appendix H: "We were aiming for k-clusterings with clusters of equal size... and compared our DP approach... with k-means clustering...For the latter we report average results obtained from running the experiment for 100 times." |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments. It only mentions: "We performed all experiments in Python." |
| Software Dependencies | No | The paper mentions software used but does not provide specific version numbers for reproducibility. It states: "We performed all experiments in Python. We used the standard clustering algorithms from Scikit-learn or Sci Py with all parameters set to their default values." |
| Experiment Setup | Yes | In order to study the extent to which these methods produce (un-)stable clusterings, for k = 2, 5, 10, 15, 20, 30, . . . , 100, we computed # Uns, Max Vi and Mean Vi as defined above for the resulting k-clusterings. For k-means++, k-medoids, k-center and k-center GF we show average results obtained from running them for 25 times since their outcomes depend on random initializations. |