reproducibilityindex.ai

Fair Algorithms for Clustering

Authors: Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, Maryam Negahbani

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that on established data sets, our algorithm performs much better in practice than what our theoretical results suggest.
Researcher Affiliation	Academia	Suman K. Bera UC Santa Cruz Santa Cruz, CA 95064 sbera@ucsc.edu Deeparnab Chakrabarty Dartmouth College Hanover, NH 03755 deeparnab@dartmouth.edu Nicolas J. Flores Dartmouth College Hanover, NH 03755 nicolasflores.19@dartmouth.edu Maryam Negahbani Dartmouth College Hanover, NH 03755 maryam@cs.dartmouth.edu
Pseudocode	Yes	Algorithm 1 Algorithm for the FAIR p-ASSIGNMENT problem
Open Source Code	No	The paper states 'We implement our algorithm in Python 3.6' but does not provide a link to the code or an explicit statement about its public availability.
Open Datasets	Yes	We use ﬁve datasets from the UCI repository [25]: 3 (1) bank [54] with 4,521 points, corresponding to phone calls from a marketing campaign by a Portuguese banking institution. (2) census [43] with 32,561 points, representing information about individuals extracted from the 1994 US census. (3) diabetes [53] with 101,766 points, extracted from diabetes patient records. (4) creditcard [33] with 30,000 points, related to information on credit card holders from a certain credit card in Taiwan. (5) census1990 [47] with 2,458,285 points, taken from the 1990 US census, which we use for run time analysis. For each of the datasets, we select a set of numerical attributes to represent the records in the Euclidean space.
Dataset Splits	No	The paper does not explicitly provide percentages or specific details for training, validation, and test splits needed to reproduce the experiment's data partitioning. It mentions using datasets for 'experiments' and 'run time analysis' but not explicit splits.
Hardware Specification	Yes	We implement our algorithm in Python 3.6 and run all our experiments on a Macbook Air with a 1.8 GHz Intel Core i5 Processor and 8 GB 1600 MHz DDR3 memory.
Software Dependencies	Yes	We implement our algorithm in Python 3.6 and run all our experiments on a Macbook Air with a 1.8 GHz Intel Core i5 Processor and 8 GB 1600 MHz DDR3 memory. We use CPLEX[34] for solving LP s. [34] IBM. Ibm ilog cplex 12.9. 2019.
Experiment Setup	Yes	We set δ = 0.2 and k = 4. ... For vanilla k-center, we use a 2-approximation algorithm due to Gonzalez [30]. For vanilla k-median, we use the single-swap 5-approximation algorithm by Arya et al. [8], augment it with the D-sampling procedure by [7] for initial center section, and take the best out of 5 trials. For k-means, we use the k-means++ implementation of [48].