reproducibilityindex.ai

Doubly Constrained Fair Clustering

Authors: John Dickerson, Seyed Esmaeili, Jamie H. Morgenstern, Claire Jie Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we carry experiments to validate our theoretical findings. We conduct experiments over datasets from the UCI repository [23] to validate our theoretical findings.
Researcher Affiliation	Academia	John Dickerson1,2, Seyed A. Esmaeili3, Jamie Morgenstern4, and Claire Jie Zhang4 1University of Maryland, College Park 2Arthur 3Simons Laufer Mathematical Sciences Institute 4University of Washington
Pseudocode	Yes	Algorithm 1 DIVIDE, Algorithm 2 DSTOGF+DS, Algorithm 3 ASSIGNMENTGF
Open Source Code	No	The paper does not provide a direct link or explicit statement about the public availability of the source code for the methodology described.
Open Datasets	Yes	We conduct experiments over datasets from the UCI repository [23] to validate our theoretical findings. Specifically, we use the Adult dataset sub-sampled to 20,000 records. Gender is used for group membership while the numeric entries are used to form a point (vector) for each record. We use the Euclidean distance.
Dataset Splits	No	The paper specifies using sub-sampled datasets (e.g., 'Adult dataset sub-sampled to 20,000 records' and 'subsample 6,000 records from the dataset' for Census1990) but does not provide details on training, validation, or test splits, nor does it mention cross-validation.
Hardware Specification	Yes	We use commdity hardware, specifically a Mac Book Pro with an Apple M2 chip.
Software Dependencies	Yes	We use Python 3.9, the CPLEX package [38] for solving linear programs and Network X [27] for max-flow rounding. Further, Scikit-learn is used for some standard ML related operations.
Experiment Setup	Yes	Further, for the GF constraints we set the lower and upper proportion bounds to βh = (1 δ)rh and αh = (1 + δ)rh for each color h where rh is color h s proportion in the dataset and we set δ = 0.2. For the DS constraints, since we do not deal with a large number of centers we set kl h = 0.8rhk and ku h = rhk.