Fair Clustering Under a Bounded Cost

Authors: Seyed Esmaeili, Brian Brubach, Aravind Srinivasan, John Dickerson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with experimental results on real-world datasets that demonstrate the validity of our algorithms. We validate our algorithms on datasets from the UCI repository [24]. The results here are for k-means clustering; additional experiments are in Appendix F.
Researcher Affiliation Academia Seyed Esmaeili University of Maryland esmaeili@cs.umd.edu Brian Brubach Wellesley College bb100@wellesley.edu Aravind Srinivasan University of Maryland asriniv1@umd.edu John P. Dickerson University of Maryland johnd@umd.edu
Pseudocode Yes Algorithm 1 :ALG-FCBC(U, UNFAIRNESS-OBJECTIVE) and Algorithm 2 :ALG-FABC(S, U, UNFAIRNESS-OBJECTIVE)
Open Source Code No The paper mentions using third-party libraries like Scikit-learn and Network X but does not state that the authors' own implementation code for the described methodology is publicly available.
Open Datasets Yes We use all 32,561 entries of the Adult dataset [34]. For the Census1990 dataset [41], because of its large size (over 2 million points) we sub-sample the dataset to a range similar to that considered in the fair clustering literature [20, 10]; specifically we use 20,000 data points. We also use the Credit Card dataset [47] which has 30,000 points (results are in Appendix F).
Dataset Splits No The paper mentions the datasets used but does not specify any training, validation, or test split percentages or sample counts, nor does it refer to predefined splits.
Hardware Specification No The paper states 'We only use commodity hardware for all experiments' but does not provide specific details such as CPU/GPU models or memory amounts.
Software Dependencies Yes We only use commodity hardware for all experiments with our programs running on Python 3.6. ... Our LPs are solved using CPLEX [32]. Scikit-learn [46] is called for subroutines such as k-means++. The network-flow rounding is handled using Network X [25].
Experiment Setup Yes We set the upper and lower bounds for each color to αh = (1+δ)rh and βh = (1 δ)rh. ... Further, for all experiments we discretize the space by ϵ = 1 27 < 0.008. ... We set δ = 0.05 and k = 10 for Adult and δ = 0.1 and k = 5 for Census1990.