Probabilistic Fair Clustering

Authors: Seyed Esmaeili, Brian Brubach, Leonidas Tsepenekas, John Dickerson

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted using our proposed algorithms as well as baselines to validate our approach, and also surface nuanced concerns when group membership is not known deterministically. Finally (5), we verify our proposed approaches on four real-world datasets. We now evaluate the performance of our algorithms over a collection of real-world datasets.
Researcher Affiliation Academia 1Department of Computer Science, University of Maryland, College Park 2Department of Computer Science, Wellesley College 3{esmaeili,ltsepene,john}@cs.umd.edu 4bb100@wellesley.edu
Pseudocode Yes Algorithm 1 Form Flow Network Edges for Culster Ci. Algorithm 2 Algorithm for Large Cluster PFC(k, p)
Open Source Code No The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the methodology described.
Open Datasets Yes We use the Bank dataset Moro et al. [2014] which has 4,521 data points. We use two additional well-known datasets: Adult Kohavi [1996], with age being the fairness attribute, and Credit Card Yeh and Lien [2009], with credit being the fairness attribute. We use the Census1990 Meek et al. [2002] dataset.
Dataset Splits No The paper describes the datasets used and some preprocessing steps, and mentions training an SVM classifier, but it does not specify explicit train/validation/test splits for the main clustering experiments themselves. The text mentions sampling 100,000 points for SVM training and another 100,000 for prediction, but this is for generating probabilistic assignments, not for the clustering evaluation splits.
Hardware Specification Yes Python 3.6 on a Mac Book Pro with 2.3GHz Intel Core i5 processor and 8GB 2133MHz LPDDR3 memory.
Software Dependencies Yes Python 3.6 on a Mac Book Pro... A state-of-the-art commercial optimization toolkit, CPLEX Manual [2016], was used for solving all LPs. Network X Hagberg et al. [2013] was used to solve minimum cost flow problems, and Scikit-learn Pedregosa et al. [2011] for standard machine learning tasks such as training SVMs, pre-processing, and performing traditional k-means clustering.
Experiment Setup Yes We set δ = 0.2, as Bera et al. [2019] did, unless stated otherwise. Fig. 4 shows the output of our large cluster algorithm over 100,000 points and k = 5 clusters with varying lower bound assumptions.