Fair Algorithms for Clustering
Authors: Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, Maryam Negahbani
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that on established data sets, our algorithm performs much better in practice than what our theoretical results suggest. |
| Researcher Affiliation | Academia | Suman K. Bera UC Santa Cruz Santa Cruz, CA 95064 sbera@ucsc.edu Deeparnab Chakrabarty Dartmouth College Hanover, NH 03755 deeparnab@dartmouth.edu Nicolas J. Flores Dartmouth College Hanover, NH 03755 nicolasflores.19@dartmouth.edu Maryam Negahbani Dartmouth College Hanover, NH 03755 maryam@cs.dartmouth.edu |
| Pseudocode | Yes | Algorithm 1 Algorithm for the FAIR p-ASSIGNMENT problem |
| Open Source Code | No | The paper states 'We implement our algorithm in Python 3.6' but does not provide a link to the code or an explicit statement about its public availability. |
| Open Datasets | Yes | We use five datasets from the UCI repository [25]: 3 (1) bank [54] with 4,521 points, corresponding to phone calls from a marketing campaign by a Portuguese banking institution. (2) census [43] with 32,561 points, representing information about individuals extracted from the 1994 US census. (3) diabetes [53] with 101,766 points, extracted from diabetes patient records. (4) creditcard [33] with 30,000 points, related to information on credit card holders from a certain credit card in Taiwan. (5) census1990 [47] with 2,458,285 points, taken from the 1990 US census, which we use for run time analysis. For each of the datasets, we select a set of numerical attributes to represent the records in the Euclidean space. |
| Dataset Splits | No | The paper does not explicitly provide percentages or specific details for training, validation, and test splits needed to reproduce the experiment's data partitioning. It mentions using datasets for 'experiments' and 'run time analysis' but not explicit splits. |
| Hardware Specification | Yes | We implement our algorithm in Python 3.6 and run all our experiments on a Macbook Air with a 1.8 GHz Intel Core i5 Processor and 8 GB 1600 MHz DDR3 memory. |
| Software Dependencies | Yes | We implement our algorithm in Python 3.6 and run all our experiments on a Macbook Air with a 1.8 GHz Intel Core i5 Processor and 8 GB 1600 MHz DDR3 memory. We use CPLEX[34] for solving LP s. [34] IBM. Ibm ilog cplex 12.9. 2019. |
| Experiment Setup | Yes | We set δ = 0.2 and k = 4. ... For vanilla k-center, we use a 2-approximation algorithm due to Gonzalez [30]. For vanilla k-median, we use the single-swap 5-approximation algorithm by Arya et al. [8], augment it with the D-sampling procedure by [7] for initial center section, and take the best out of 5 trials. For k-means, we use the k-means++ implementation of [48]. |