The Fairness-Quality Tradeoff in Clustering

Authors: Rashida Hakim, Ana-Andreea Stoica, Christos Papadimitriou, Mihalis Yannakakis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically explore faster methods for computing an approximate Pareto front by adapting constraint optimization methods from fair clustering [26] in Section 4. These experiments explore the trade-off between accuracy and speed for our problem by comparing the tighter approximation obtained through our proposed algorithms with the looser Pareto front approximation obtained using faster methods.
Researcher Affiliation Academia Rashida Hakim Columbia University Ana-Andreea Stoica Max Planck Institute for Intelligent Systems, Tübingen Christos H. Papadimitriou Columbia University Mihalis Yannakakis Columbia University
Pseudocode Yes Algorithm 1 Dynamic Programming Algorithm for Computing the Assignment Pareto Front
Open Source Code Yes All code and data used in the paper can be found at this link.
Open Datasets Yes We use the following real-world datasets for our experiments: the Adult dataset and the Census dataset retrieved from the UCI repository (as the Census1990 version) [43, 46], and the Blue Bike trip history dataset.3 The Blue Bike trip history dataset is retrieved from https://bluebikes.com/system-data.
Dataset Splits No The paper states it samples 1,000 data points from each dataset, but it does not explicitly describe how these points are partitioned into training, validation, and test sets. It does not explicitly mention a "validation" split.
Hardware Specification No All experiments are run on local computers, using Python 3.9, k-means++ [5], Network X [33], and CPLEX for the implementation of the FCBC algorithm [52, 26]. The term "local computers" is too general and lacks specific hardware details.
Software Dependencies Yes All experiments are run on local computers, using Python 3.9, k-means++ [5], Network X [33], and CPLEX for the implementation of the FCBC algorithm [52, 26].
Experiment Setup Yes We set δ [0.005, 0.05] for all experiments and k = 2 clusters. Additional experiments for k = 3 are reported in Appendix E, noting qualitatively similar results. For our datasets, we use the k-means++ clustering algorithm as the vanilla clustering that has an approximation ratio of O(log(k)) [5].