reproducibilityindex.ai

The Fairness-Quality Tradeoff in Clustering

Authors: Rashida Hakim, Ana-Andreea Stoica, Christos Papadimitriou, Mihalis Yannakakis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically explore faster methods for computing an approximate Pareto front by adapting constraint optimization methods from fair clustering [26] in Section 4. These experiments explore the trade-off between accuracy and speed for our problem by comparing the tighter approximation obtained through our proposed algorithms with the looser Pareto front approximation obtained using faster methods.
Researcher Affiliation	Academia	Rashida Hakim Columbia University Ana-Andreea Stoica Max Planck Institute for Intelligent Systems, Tübingen Christos H. Papadimitriou Columbia University Mihalis Yannakakis Columbia University
Pseudocode	Yes	Algorithm 1 Dynamic Programming Algorithm for Computing the Assignment Pareto Front
Open Source Code	Yes	All code and data used in the paper can be found at this link.
Open Datasets	Yes	We use the following real-world datasets for our experiments: the Adult dataset and the Census dataset retrieved from the UCI repository (as the Census1990 version) [43, 46], and the Blue Bike trip history dataset.3 The Blue Bike trip history dataset is retrieved from https://bluebikes.com/system-data.
Dataset Splits	No	The paper states it samples 1,000 data points from each dataset, but it does not explicitly describe how these points are partitioned into training, validation, and test sets. It does not explicitly mention a "validation" split.
Hardware Specification	No	All experiments are run on local computers, using Python 3.9, k-means++ [5], Network X [33], and CPLEX for the implementation of the FCBC algorithm [52, 26]. The term "local computers" is too general and lacks specific hardware details.
Software Dependencies	Yes	All experiments are run on local computers, using Python 3.9, k-means++ [5], Network X [33], and CPLEX for the implementation of the FCBC algorithm [52, 26].
Experiment Setup	Yes	We set δ [0.005, 0.05] for all experiments and k = 2 clusters. Additional experiments for k = 3 are reported in Appendix E, noting qualitatively similar results. For our datasets, we use the k-means++ clustering algorithm as the vanilla clustering that has an approximation ratio of O(log(k)) [5].