Fair Clustering Under a Bounded Cost
Authors: Seyed Esmaeili, Brian Brubach, Aravind Srinivasan, John Dickerson
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with experimental results on real-world datasets that demonstrate the validity of our algorithms. We validate our algorithms on datasets from the UCI repository [24]. The results here are for k-means clustering; additional experiments are in Appendix F. |
| Researcher Affiliation | Academia | Seyed Esmaeili University of Maryland esmaeili@cs.umd.edu Brian Brubach Wellesley College bb100@wellesley.edu Aravind Srinivasan University of Maryland asriniv1@umd.edu John P. Dickerson University of Maryland johnd@umd.edu |
| Pseudocode | Yes | Algorithm 1 :ALG-FCBC(U, UNFAIRNESS-OBJECTIVE) and Algorithm 2 :ALG-FABC(S, U, UNFAIRNESS-OBJECTIVE) |
| Open Source Code | No | The paper mentions using third-party libraries like Scikit-learn and Network X but does not state that the authors' own implementation code for the described methodology is publicly available. |
| Open Datasets | Yes | We use all 32,561 entries of the Adult dataset [34]. For the Census1990 dataset [41], because of its large size (over 2 million points) we sub-sample the dataset to a range similar to that considered in the fair clustering literature [20, 10]; specifically we use 20,000 data points. We also use the Credit Card dataset [47] which has 30,000 points (results are in Appendix F). |
| Dataset Splits | No | The paper mentions the datasets used but does not specify any training, validation, or test split percentages or sample counts, nor does it refer to predefined splits. |
| Hardware Specification | No | The paper states 'We only use commodity hardware for all experiments' but does not provide specific details such as CPU/GPU models or memory amounts. |
| Software Dependencies | Yes | We only use commodity hardware for all experiments with our programs running on Python 3.6. ... Our LPs are solved using CPLEX [32]. Scikit-learn [46] is called for subroutines such as k-means++. The network-flow rounding is handled using Network X [25]. |
| Experiment Setup | Yes | We set the upper and lower bounds for each color to αh = (1+δ)rh and βh = (1 δ)rh. ... Further, for all experiments we discretize the space by ϵ = 1 27 < 0.008. ... We set δ = 0.05 and k = 10 for Adult and δ = 0.1 and k = 5 for Census1990. |