The Fairness-Quality Tradeoff in Clustering
Authors: Rashida Hakim, Ana-Andreea Stoica, Christos Papadimitriou, Mihalis Yannakakis
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically explore faster methods for computing an approximate Pareto front by adapting constraint optimization methods from fair clustering [26] in Section 4. These experiments explore the trade-off between accuracy and speed for our problem by comparing the tighter approximation obtained through our proposed algorithms with the looser Pareto front approximation obtained using faster methods. |
| Researcher Affiliation | Academia | Rashida Hakim Columbia University Ana-Andreea Stoica Max Planck Institute for Intelligent Systems, Tübingen Christos H. Papadimitriou Columbia University Mihalis Yannakakis Columbia University |
| Pseudocode | Yes | Algorithm 1 Dynamic Programming Algorithm for Computing the Assignment Pareto Front |
| Open Source Code | Yes | All code and data used in the paper can be found at this link. |
| Open Datasets | Yes | We use the following real-world datasets for our experiments: the Adult dataset and the Census dataset retrieved from the UCI repository (as the Census1990 version) [43, 46], and the Blue Bike trip history dataset.3 The Blue Bike trip history dataset is retrieved from https://bluebikes.com/system-data. |
| Dataset Splits | No | The paper states it samples 1,000 data points from each dataset, but it does not explicitly describe how these points are partitioned into training, validation, and test sets. It does not explicitly mention a "validation" split. |
| Hardware Specification | No | All experiments are run on local computers, using Python 3.9, k-means++ [5], Network X [33], and CPLEX for the implementation of the FCBC algorithm [52, 26]. The term "local computers" is too general and lacks specific hardware details. |
| Software Dependencies | Yes | All experiments are run on local computers, using Python 3.9, k-means++ [5], Network X [33], and CPLEX for the implementation of the FCBC algorithm [52, 26]. |
| Experiment Setup | Yes | We set δ [0.005, 0.05] for all experiments and k = 2 clusters. Additional experiments for k = 3 are reported in Appendix E, noting qualitatively similar results. For our datasets, we use the k-means++ clustering algorithm as the vanilla clustering that has an approximation ratio of O(log(k)) [5]. |