A Pairwise Fair and Community-preserving Approach to k-Center Clustering
Authors: Brian Brubach, Darshan Chakrabarti, John Dickerson, Samir Khuller, Aravind Srinivasan, Leonidas Tsepenekas
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we compare the effectiveness of our approach to classical k-center algorithms/heuristics and explore the tradeoff between optimal clustering and fairness. Beyond theoretical results, we further explore the algorithm experimentally in Section 3 on 40 different problem instances of a benchmark dataset to show that it performs as expected or better. On the benchmark problems, we illustrate in Figure 2 how tuning a parameter in our algorithm can adjust the trade-off between fairness and minimizing the cluster radius. In Section 4, we evaluate our algorithm on a real dataset over different target numbers of clusters. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park, Maryland, USA 2Computer Science Department, Wellesley College, Wellesley, MA, USA 3School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 4Department of Computer Science, Northwestern University, Evanston, IL, USA. |
| Pseudocode | Yes | Algorithm 1 FAIRALG |
| Open Source Code | Yes | The code for these experiments and those in Section 4 is available on Git Hub.1 |
| Open Datasets | Yes | We ran experiments on the well-known p-median dataset from OR-Lib (Beasley, 1990) which contains 40 different problem instances. We ran additional experiments on a sample of 1,000 points from the adult dataset (Kohavi & Becker, 1996). |
| Dataset Splits | No | The paper mentions using specific datasets but does not explicitly describe training, validation, or test dataset splits, percentages, or cross-validation setup. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts) are provided for the experimental setup. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions). |
| Experiment Setup | Yes | Fair algorithm implementation. Our implementation of the fair algorithm uses Scr to find the initial set of centers. ... We parameterize our algorithm with the mean, 1/λ, of the exponential distribution we sample from, where λ is the exponential parameter used in Algorithm 1. For our Exact fair algorithm we set λ = 1/RScr... For our Medium fair algorithm, we set λ = 4/RScr... Finally, for our Tight fair algorithm, we simply divide our mean by another factor of 4 to get λ = 16/RScr. In addition, our implementation makes two natural modifications to Algorithm 1 that do not affect the theoretical bounds. First, the list of centers found in Step 1 is uniformly randomly permuted before growing the clusters. Second, if we have to choose a new center point in Step 6, we choose the point in the cluster which minimizes the radius as opposed to any arbitrary point. |