reproducibilityindex.ai

Making Existing Clusterings Fairer: Algorithms, Complexity Results and Insights

Authors: Ian Davidson, S.S Ravi3733-3740

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Twitter, Census and NYT data sets show that our methods can modify existing clusterings for data sets in excess of 100,000 instances within minutes on laptops and ﬁnd as fair but higher quality clusterings than fair by design clustering algorithms.
Researcher Affiliation	Academia	Ian Davidson,1 S. S. Ravi2 1Computer Science Department, University of California, Davis 2Biocomplexity Institute & Initiative, University of Virginia and Computer Science Department, University at Albany SUNY
Pseudocode	No	No explicit pseudocode or algorithm block was found. The paper refers to a technical report (Davidson and Ravi 2019) for algorithm details.
Open Source Code	No	No explicit statement about releasing the source code for the described methodology or a link to a code repository was found.
Open Datasets	Yes	Here we ﬁrst analyze the well studied Adult dataset (e.g., (Chierichetti et al. 2017; Backurs et al. 2019)) that consists of 48,842 individuals (males 66.8%, females 33.2%) from the UCI repository (Dheeru and Karra Taniskidou 2017).
Dataset Splits	No	The paper does not explicitly state specific training, validation, or test dataset splits (e.g., percentages, sample counts, or detailed splitting methodology).
Hardware Specification	Yes	The mean run time over 100 experiments on a single core of a Mac Book Pro laptop (i5 processor) for a randomly created subset of the data sets.
Software Dependencies	No	The paper mentions software tools like MATLAB and BOW toolkit, but does not provide specific version numbers for any key software components or libraries required for reproduction.
Experiment Setup	Yes	To make these ﬁrst two clusters fairer we apply our method by placing bounds on the ﬁrst and second cluster s protected status ratios to be 0.5 0.05 with the remaining clusters proportion of females to be their current values as reported in Table 2 0.15. This is achieved by setting the Ui and Li bounds in Equations (2) and (3). For each data set we ﬁnd the best k = 10 clustering using plain k-means and spectral clustering + k-means (both from 1000 random restarts).