Doubly Constrained Fair Clustering
Authors: John Dickerson, Seyed Esmaeili, Jamie H. Morgenstern, Claire Jie Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we carry experiments to validate our theoretical findings. We conduct experiments over datasets from the UCI repository [23] to validate our theoretical findings. |
| Researcher Affiliation | Academia | John Dickerson1,2, Seyed A. Esmaeili3, Jamie Morgenstern4, and Claire Jie Zhang4 1University of Maryland, College Park 2Arthur 3Simons Laufer Mathematical Sciences Institute 4University of Washington |
| Pseudocode | Yes | Algorithm 1 DIVIDE, Algorithm 2 DSTOGF+DS, Algorithm 3 ASSIGNMENTGF |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the public availability of the source code for the methodology described. |
| Open Datasets | Yes | We conduct experiments over datasets from the UCI repository [23] to validate our theoretical findings. Specifically, we use the Adult dataset sub-sampled to 20,000 records. Gender is used for group membership while the numeric entries are used to form a point (vector) for each record. We use the Euclidean distance. |
| Dataset Splits | No | The paper specifies using sub-sampled datasets (e.g., 'Adult dataset sub-sampled to 20,000 records' and 'subsample 6,000 records from the dataset' for Census1990) but does not provide details on training, validation, or test splits, nor does it mention cross-validation. |
| Hardware Specification | Yes | We use commdity hardware, specifically a Mac Book Pro with an Apple M2 chip. |
| Software Dependencies | Yes | We use Python 3.9, the CPLEX package [38] for solving linear programs and Network X [27] for max-flow rounding. Further, Scikit-learn is used for some standard ML related operations. |
| Experiment Setup | Yes | Further, for the GF constraints we set the lower and upper proportion bounds to βh = (1 δ)rh and αh = (1 + δ)rh for each color h where rh is color h s proportion in the dataset and we set δ = 0.2. For the DS constraints, since we do not deal with a large number of centers we set kl h = 0.8rhk and ku h = rhk. |