reproducibilityindex.ai

Probabilistic Fair Clustering

Authors: Seyed Esmaeili, Brian Brubach, Leonidas Tsepenekas, John Dickerson

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are conducted using our proposed algorithms as well as baselines to validate our approach, and also surface nuanced concerns when group membership is not known deterministically. Finally (5), we verify our proposed approaches on four real-world datasets. We now evaluate the performance of our algorithms over a collection of real-world datasets.
Researcher Affiliation	Academia	1Department of Computer Science, University of Maryland, College Park 2Department of Computer Science, Wellesley College 3{esmaeili,ltsepene,john}@cs.umd.edu 4bb100@wellesley.edu
Pseudocode	Yes	Algorithm 1 Form Flow Network Edges for Culster Ci. Algorithm 2 Algorithm for Large Cluster PFC(k, p)
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the methodology described.
Open Datasets	Yes	We use the Bank dataset Moro et al. [2014] which has 4,521 data points. We use two additional well-known datasets: Adult Kohavi [1996], with age being the fairness attribute, and Credit Card Yeh and Lien [2009], with credit being the fairness attribute. We use the Census1990 Meek et al. [2002] dataset.
Dataset Splits	No	The paper describes the datasets used and some preprocessing steps, and mentions training an SVM classifier, but it does not specify explicit train/validation/test splits for the main clustering experiments themselves. The text mentions sampling 100,000 points for SVM training and another 100,000 for prediction, but this is for generating probabilistic assignments, not for the clustering evaluation splits.
Hardware Specification	Yes	Python 3.6 on a Mac Book Pro with 2.3GHz Intel Core i5 processor and 8GB 2133MHz LPDDR3 memory.
Software Dependencies	Yes	Python 3.6 on a Mac Book Pro... A state-of-the-art commercial optimization toolkit, CPLEX Manual [2016], was used for solving all LPs. Network X Hagberg et al. [2013] was used to solve minimum cost ﬂow problems, and Scikit-learn Pedregosa et al. [2011] for standard machine learning tasks such as training SVMs, pre-processing, and performing traditional k-means clustering.
Experiment Setup	Yes	We set δ = 0.2, as Bera et al. [2019] did, unless stated otherwise. Fig. 4 shows the output of our large cluster algorithm over 100,000 points and k = 5 clusters with varying lower bound assumptions.