Probabilistic Fair Clustering
Authors: Seyed Esmaeili, Brian Brubach, Leonidas Tsepenekas, John Dickerson
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted using our proposed algorithms as well as baselines to validate our approach, and also surface nuanced concerns when group membership is not known deterministically. Finally (5), we verify our proposed approaches on four real-world datasets. We now evaluate the performance of our algorithms over a collection of real-world datasets. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park 2Department of Computer Science, Wellesley College 3{esmaeili,ltsepene,john}@cs.umd.edu 4bb100@wellesley.edu |
| Pseudocode | Yes | Algorithm 1 Form Flow Network Edges for Culster Ci. Algorithm 2 Algorithm for Large Cluster PFC(k, p) |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | We use the Bank dataset Moro et al. [2014] which has 4,521 data points. We use two additional well-known datasets: Adult Kohavi [1996], with age being the fairness attribute, and Credit Card Yeh and Lien [2009], with credit being the fairness attribute. We use the Census1990 Meek et al. [2002] dataset. |
| Dataset Splits | No | The paper describes the datasets used and some preprocessing steps, and mentions training an SVM classifier, but it does not specify explicit train/validation/test splits for the main clustering experiments themselves. The text mentions sampling 100,000 points for SVM training and another 100,000 for prediction, but this is for generating probabilistic assignments, not for the clustering evaluation splits. |
| Hardware Specification | Yes | Python 3.6 on a Mac Book Pro with 2.3GHz Intel Core i5 processor and 8GB 2133MHz LPDDR3 memory. |
| Software Dependencies | Yes | Python 3.6 on a Mac Book Pro... A state-of-the-art commercial optimization toolkit, CPLEX Manual [2016], was used for solving all LPs. Network X Hagberg et al. [2013] was used to solve minimum cost flow problems, and Scikit-learn Pedregosa et al. [2011] for standard machine learning tasks such as training SVMs, pre-processing, and performing traditional k-means clustering. |
| Experiment Setup | Yes | We set δ = 0.2, as Bera et al. [2019] did, unless stated otherwise. Fig. 4 shows the output of our large cluster algorithm over 100,000 points and k = 5 clusters with varying lower bound assumptions. |