reproducibilityindex.ai

Fair Clustering Under a Bounded Cost

Authors: Seyed Esmaeili, Brian Brubach, Aravind Srinivasan, John Dickerson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with experimental results on real-world datasets that demonstrate the validity of our algorithms. We validate our algorithms on datasets from the UCI repository [24]. The results here are for k-means clustering; additional experiments are in Appendix F.
Researcher Affiliation	Academia	Seyed Esmaeili University of Maryland esmaeili@cs.umd.edu Brian Brubach Wellesley College bb100@wellesley.edu Aravind Srinivasan University of Maryland asriniv1@umd.edu John P. Dickerson University of Maryland johnd@umd.edu
Pseudocode	Yes	Algorithm 1 :ALG-FCBC(U, UNFAIRNESS-OBJECTIVE) and Algorithm 2 :ALG-FABC(S, U, UNFAIRNESS-OBJECTIVE)
Open Source Code	No	The paper mentions using third-party libraries like Scikit-learn and Network X but does not state that the authors' own implementation code for the described methodology is publicly available.
Open Datasets	Yes	We use all 32,561 entries of the Adult dataset [34]. For the Census1990 dataset [41], because of its large size (over 2 million points) we sub-sample the dataset to a range similar to that considered in the fair clustering literature [20, 10]; speciﬁcally we use 20,000 data points. We also use the Credit Card dataset [47] which has 30,000 points (results are in Appendix F).
Dataset Splits	No	The paper mentions the datasets used but does not specify any training, validation, or test split percentages or sample counts, nor does it refer to predefined splits.
Hardware Specification	No	The paper states 'We only use commodity hardware for all experiments' but does not provide specific details such as CPU/GPU models or memory amounts.
Software Dependencies	Yes	We only use commodity hardware for all experiments with our programs running on Python 3.6. ... Our LPs are solved using CPLEX [32]. Scikit-learn [46] is called for subroutines such as k-means++. The network-ﬂow rounding is handled using Network X [25].
Experiment Setup	Yes	We set the upper and lower bounds for each color to αh = (1+δ)rh and βh = (1 δ)rh. ... Further, for all experiments we discretize the space by ϵ = 1 27 < 0.008. ... We set δ = 0.05 and k = 10 for Adult and δ = 0.1 and k = 5 for Census1990.