Better Algorithms for Individually Fair $k$-Clustering

Authors: Maryam Negahbani, Deeparnab Chakrabarty

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, our main contribution is to use Linear Programming (LP) techniques to obtain better algorithms for this problem, both in theory and in practice. We prove that by modifying known LP rounding techniques, one gets a worst-case guarantee on the objective which is much better than in MV20, and empirically, this objective is extremely close to the optimal. Furthermore, our theoretical fairness guarantees are comparable with MV20 in theory, and empirically, we obtain noticeably fairer solutions. Although solving the LP exactly might be prohibitive, we demonstrate that in practice, a simple sparsification technique drastically improves the run-time of our algorithm. ... Summary. We run experiments to show that the empirical performance of our algorithms can be much superior to the worst-case guarantees claimed by our theorems (the codes are publicly availably on Github3).
Researcher Affiliation Academia Deeparnab Chakrabarty Department of Computer Science Dartmouth College Hanover, NH 03755 deepc@cs.dartmouth.edu Maryam Negahbani Department of Computer Science Dartmouth College Hanover, NH 03755 maryam@cs.dartmouth.edu
Pseudocode Yes Algorithm 1 Filter ... Algorithm 2 Fair-Round: FAIR-(p, k)-CLUSTERING bi-criteria approximation ... Algorithm 3 Sparsification + Fair-Round
Open Source Code Yes Summary. We run experiments to show that the empirical performance of our algorithms can be much superior to the worst-case guarantees claimed by our theorems (the codes are publicly availably on Github3). https://github.com/moonin12/individually-fair-k-clustering
Open Datasets Yes Similar to [29], we use 3 datasets from the UCI repository4 [16] (also used in previous fair clustsering works of [14, 6]). ... 1bank [35] with 4,521 points ... 2census [28] with 32,561 points ... 3diabetes [34] with 101,766 points
Dataset Splits No The paper mentions using 'average of 10 random samples of size 1000 each' for figures, which implies sampling. However, it does not specify any explicit train, validation, or test dataset splits (e.g., percentages, absolute counts, or references to predefined splits).
Hardware Specification Yes We run our experiments in Python 3.8.2 on a Mac Book Pro with 2.3 GHz 8-Core Intel Corei9 processor and 32 GB of DDR4 memory.
Software Dependencies Yes We run our experiments in Python 3.8.2 on a Mac Book Pro with 2.3 GHz 8-Core Intel Corei9 processor and 32 GB of DDR4 memory. We solve our linear programs using the Python API for CPLEX [22].
Experiment Setup Yes While implementing our algorithm, we make the following optimization : instead of choosing the constant 2 in Line 2 (definition of R) in our algorithm, we perform a binary-search to determine a better constant. Specifically, we find the smallest β for which Algorithm 1 gives at most k centers with R(v) := min{r(v), (βCv)1/p} for all v. ... Here, δ = 0.3, 0.05 and 0.01 for bank, census, and diabetes. Note that [25], k-Means++, and our sparsified algorithm Sparse-Fair-Round have very small differences in their running times.