A Pairwise Fair and Community-preserving Approach to k-Center Clustering

Authors: Brian Brubach, Darshan Chakrabarti, John Dickerson, Samir Khuller, Aravind Srinivasan, Leonidas Tsepenekas

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we compare the effectiveness of our approach to classical k-center algorithms/heuristics and explore the tradeoff between optimal clustering and fairness. Beyond theoretical results, we further explore the algorithm experimentally in Section 3 on 40 different problem instances of a benchmark dataset to show that it performs as expected or better. On the benchmark problems, we illustrate in Figure 2 how tuning a parameter in our algorithm can adjust the trade-off between fairness and minimizing the cluster radius. In Section 4, we evaluate our algorithm on a real dataset over different target numbers of clusters.
Researcher Affiliation Academia 1Department of Computer Science, University of Maryland, College Park, Maryland, USA 2Computer Science Department, Wellesley College, Wellesley, MA, USA 3School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 4Department of Computer Science, Northwestern University, Evanston, IL, USA.
Pseudocode Yes Algorithm 1 FAIRALG
Open Source Code Yes The code for these experiments and those in Section 4 is available on Git Hub.1
Open Datasets Yes We ran experiments on the well-known p-median dataset from OR-Lib (Beasley, 1990) which contains 40 different problem instances. We ran additional experiments on a sample of 1,000 points from the adult dataset (Kohavi & Becker, 1996).
Dataset Splits No The paper mentions using specific datasets but does not explicitly describe training, validation, or test dataset splits, percentages, or cross-validation setup.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts) are provided for the experimental setup.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions).
Experiment Setup Yes Fair algorithm implementation. Our implementation of the fair algorithm uses Scr to find the initial set of centers. ... We parameterize our algorithm with the mean, 1/λ, of the exponential distribution we sample from, where λ is the exponential parameter used in Algorithm 1. For our Exact fair algorithm we set λ = 1/RScr... For our Medium fair algorithm, we set λ = 4/RScr... Finally, for our Tight fair algorithm, we simply divide our mean by another factor of 4 to get λ = 16/RScr. In addition, our implementation makes two natural modifications to Algorithm 1 that do not affect the theoretical bounds. First, the list of centers found in Step 1 is uniformly randomly permuted before growing the clusters. Second, if we have to choose a new center point in Step 6, we choose the point in the cluster which minimizes the radius as opposed to any arbitrary point.